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This document describes the MIPS R4000 RISC-based microprocessor. 
The chapters andappendices of thisbookare grouped in the following 
way: 

• Architecture 

• Implementation Details 

• Electrical and Physical Specifications 

• Instruction Set Summaries 

Chapter 1 is a general discussion (including a historical context) of the 

RISC microprocessor in general and the R4000 in particular. 

Chapter 2 provides an overview of the CPU instruction set by 

summarizing each instruction category in a table. 

Chapter 3 describes the operation of the R4000 instruction execution 

pipeline. It describes the basic operation of the pipeline and 

interruptions to the pipeline flow caused by interlocks and exceptions. 

Chapter 4 is a discussion of the memory management system 

including memory mapping, virtual memory, and address 

translation. 

Chapter 5 is a discussion of the exception processing respurces and 

capabilities of the R4000. It presents an overview of the CPU exception 

handling process and describes the format and use of each CPU 

exception handling register. 

Chapter 6 is a discussion of the Floating-Point Unit (FPU). The FPU is 

a coprocessor for the Central Processing Unit (CPU) that extends the 

CPU instruction set to perform floating-point arithmetic operations. 
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Chapter 7 is a discussion of the Floating-Point Unifs exception •- 

processing. 

Chapter 8 is a discussion of the signals that comprise the interface 

between the R4000 and other components in the system. The signals 

discussed include the System Interface, the Clock/Control Interface, 

the Secondary Cache Interface, the Interrupt Interface, the 

Initialization Interface, and the JTAG Interface. 

Chapter 9 is a discussion of the system interface. The system interface 

allows the processor access to external resources such as memory and 

I/O. It also allows an external agent access to certain processor 

internal resources. 

Chapter 10 is a discussion of the clocks used in the R4000 and the 

processor status reporting mechanism. The topics covered include the 

basicSystemao(±s,interfacingtoaPhase-Locked system, interfacing 

to a system without Phase Locking, and processor Status Outputs. 

Chapter 11 is a discussion of the cache memory hierarchy, the 

operation of the primary and secondary caches, and the R4000's 

interface to the secondary cache. It also discusses cache-coherent 

operation in a multiprocessor system 

Chapter 12 is a discussion of the Initialization interface. The 

fundamental, or 'start-up', operational modes for the processor are 

introduced to the processor through the initialization interface. 

Chapter 13 is a discussion of the JTAG interface. The JTAG boundary 

scan mechanism provides a capability for testing the interconnection 

between the R4000 processor, the printed circuit board to which it is 

attached, and the other components on the board. 

Chapter 14 is a discussion of the six hardware, two software, and one 

non-maskable processor interrupts. 

Chapter 15 is a discussion of the Error Checking and Correcting (ECC) 

mechanisms of the R4000. 

Chapter 16 is a discussion of the electrical and physical characteristics 

of the R4000. 

Appendix A is a detailed description of the operation of each R4000 

instruction in both 32- and 64-bit modes. The instructions are listed in 

alphabetical order. 

Appendix B is a detailed description of the operation of each (FPU) 

instruction. The instructions are listed alphabetically. 

Appendix C is a discussion of the Single Error Correcting Double 

Error Detecting (SECDED) codes. These are the codes chosen for the 

processor's secondary cache data and secondary cache tag. 



to R4000 User's Manual-Preliminary 



Preface 

Appendix D is a discussion of sub-block ordering. Sub-block ordering 

is an order for the transmission of data elements that form a block of 

data when the first transmitted data element is not the data element at 

the beginning of the block. 

Appendix E is a discussion of the output buffer the Ai/At control 

mechanism which controls the speed of the R4000 output driver, 

ensuring drive-off times are only as fast as necessary to meet 

thesystem requirement of single cycle transfers. 

Appendix F is a discussion of the passive components which comprise 

the Phase-Locked Loop (PLL). 

Appendix G is a desciption of Coprocessor hazards. 
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This introductory chapter provides you with the following 
information: 

• An explanation of RISC architecture, with subsections 
describing the benefits of using RISC design, the relationship 
between RISC architecture and optimizing compilers, and a 
description of the MIPS compiler family. 

• An overview of the R4000 features, including the Memory 
Management System, pipeline architecture, memory hierarchy, 
and interfaces to external cache memory and the remainder of 
the system. 



What Is RISC? 



Historically, the evolution of computer architectures has been 
dominated by families of increasingly complex central processors. 
Under market pressures to preserve existing software, Complex 
Instruction Set Computer (CISC) architectures evolved by the 
accretion of microcode and increasingly intricate instruction sets. This 
intricacy in architecture was itself driven by the need to support high- 
level languages (HLLs) and operating systems, as advances in 
semiconductor technology made it possible to fabricate integrated 
circuits of greater and greater complexity. And at the time it seemed 
self-evident to designers that architectures should become more 
complex as technological advances made such VLSI designs possible. 
In recent years however, Reduced Instruction Set Computer (RISC) 
architectures have implemented a different model for the interaction 
between hardware, firmware, and software. RISC concepts emerged 
from a statistical analysis of the manner in which software actually 
uses processor resources: dynamic measurement of system kernels 
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and object modules generated by optimizing compilers showed that 
the simplest instructions were used most often— even in the code for 
CISC machines! Correspondingly, complex instructions were often 
unused because their single way of performing a complex operation 
rarely matched the precise needs of the high-level language. 
RISC, on the other hand, eliminated microcode routines and turned 
low-level control of the machine over to software. The RISC approach 
was not new, but its application became more universal in recent 
years, due to the increasing prevalence of high-level languages, the 
development of compilers able to oyrtirnize at the microcode level, 
and dramatic advances in semiconductor memory and packaging. It 
is now feasible to replace a machine's relatively-slow microcode ROM 
with faster RAM, organized as an instruction cache. Machine control 
then resides in this instruction cache that is, in effect, customized on 
the fly: the instruction stream generated by system- and compiler- 
generated code provides a precise fit between the requirements of 
high-level software and the low-level capabilities of the hardware. 
Reducing or simplifying the instruction set was not the primary goal 
of RISC architecture; it is a pleasant side effect of techniques used to 
gam me highest performance possible from available technology. 
Thus, the term Reduced Instruction Set Computers is a bit misleading: it 
is the push for performance that really drives and shapes RISC 
designs. 

Benefits of RISC Design 

Some of the benefits that result from RISC design techniques are not 
directly attributable to the drive to increase performance, but are a 
result of the basic reduction in complexity— a simpler design allows 
both chip-area resources and human resources to be applied to 
features that enhance performance. Some of these benefits are 
described below. 

Shorter Design Cycle 

The architectures of RISC processors can be implemented more 
quickly than their CISC counterparts: it is easier to fabricate and 
debug a streamlined, simplified architecture with no microcode than 
a complex, microcoded architecture. CISC processors have such a 
long design cycle that they may not be completely debugged by the 
time they have been rendered technologically obsolete. The shorter 
time required to design and implement RISC processors allows them 
to make use of the best available technologies. 
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Effective Utilization of Chip Area 

The simplicity of RISC processors also frees scarce chip geography for 
performance-critical resources such as larger register files, Translation 
Lookaside Buffers (TLBs), coprocessors, and fast multiply and divide 
units. Such resources help RISC processors obtain an even greater 
performance edge. 

User (Programmer) Benefits 

Simplicity in architecture also helps the user in the following ways: 

• A uniform instruction set is easier to use. 

• A closer correlation is made possible between the 
instruction count and the cycle count, making it easier t* 
measure code optimization activities. 

Advanced Semiconductor Technologies 

Each new VLSI technology (ECL, GaAs) is introduced with tight 
limits on the number of transistors that can be fit on each chip. Since 
the simplicity of a RISC processor allows it to be implemented in fewer 
transistors than its CISC counterpart, the first computers capable of 
exploiting these new VLSI technologies have been using and will 
continue to use RISC architecture. 

Optimizing Compilers 

RISC architecture is designed so that compilers, not assembly 
languages, have the optimal working environment RISC philosophy 
assumes that high-level language (HLL) programming is used, a 
philosophy in contrast to the older CISC philosophy developed when 
assembly language programming was of primary importance. 
The trend toward HLL instructions has led to the development of 
more efficient compilers to convert HLL instructions to machine code. 
Primary measures of a compiler's efficiency are: 

• the compactness of its generated code 

• the shortness of its execution time 

Optimizing compilers and RISC architectures have a synergistic 
relationship; compilers perform their best job of optimization in a 
RISC environment. Reciprocally, RISC architectures rely on compilers 
to obtain their best performance. 
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During the development of more efficient compilers, an analysis of 
instruction streams revealed that the greatest amount of time was 
spent 

• executing simple instructions 

• performing load and store operations 

while the more complex instructions were used less frequently. 
It was also learned that compilers produce code that is often a narrow 
subset of the processor's instruction set architecture (ISA). A compiler 
prefers instructions that perform simple, well-defined operations and 
generate minimal side-effects. Complex instructions and features are 
just not used by compilers; the more complex, powerful instructions 
are either too difficult for the compiler to use or those instructions do 
not precisely fit HLL requirements. 

Thus, a natural match exists between RISC architectures and efficient, 
optimizing compilers. This match makes it easier for compilers to 
generate the most effective sequences of machine instructions to 
accomplish tasks defined by the high-level language. 

Family of Compilers 

Many compiler products — especially those designed for 
microprocessors — are cobbled from various sources and do not 
necessarily fit together very well. However, the MIPS language suite 
approach shares common elements across the family of compilers 
instead of treating each language's compiler as a separate entity. In 
this way the MIPS suite of compilers, RISCompilers™, can offer both 
tight integration and broad language coverage. 
The MIPS suite of compilers does the following: 

• Provides industry-standard front ends for six languages (C, 
FORTRAN, Pascal, Ada, PU, COBOL) 

• Uses a common intermediate language, thus offering an 
efficient way to add language front ends over time 

• Shares all of the back end optimization and code 
generation 

• Uses the same object format and calling conventions 

• Supports mixed-language programs cleanly 

• Supports debugging of programs written in all languages, 
including mixtures 
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This language suite approach yields high-quality compilers for all 
languages, since common elements make up the majority of each of 
the language products. In addition, the ability to develop and execute 
multi-language programs is provided, promoting flexibility in 
development avoiding recode of proven program segments, and 
protecting the user's software investment The common back-end also 
exports optimizing and code-generating improvements immediately 
throughout the suite of RISCompilers, thereby reducing maintenance. 



64-bit Architecture 



The MPS R4000 family of RISC microprocessors consists of high- 
performance 32-bit and 64-bit processors; the natural mode of 
operation for the R4000 is as a 64-bit microprocessor. It can, however, 
be programmed to operate as a 32-bit processor. 
The R4000 provides a 64-bit on-chip floating-point unit (FPU), 64-bit 
integer ALU, 64-bit integer registers, and a 64-bit virtual address 
space. 32-bit applications maintain compatibility even when the 
processor operates as a 64-bit processor. 
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The R4000 Processor 

The R4000 has many features that differ from the R2000/R3000 
processor family. In addition to a high-performance integer unit, the 
R4000 contains: 

• a 48-entry fully-associative on-chip TLB, with two pages 
mapped to each entry 

• separate on-chip primary data and instruction caches 

• an optional off-chip secondary cache 

• an on-chip FPU 

Figure 1-1 shows a block diagram of the R4000. 
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Figure 1-1 R4000 Internal Block Diagram. 

Processor General Features 

This section briefly describes the programming model the MMU, and 
the caches in the R4000. A more detailed description is given in 
succeeding sections. 

• Full 32-bit and 64-bit Operation. The R4000 contains 
thirty-two general-purpose 64-bit registers. (When 
operating as a 32-bit processor, the general-purpose 
registers are 32-bits wide.) All instructions are thirty-two 
bits wide. 
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Efficient Pipelining. The superpipeline design of the 
processor results in an execution rate approaching one 
instruction per cycle. Pipeline stalls and exceptional events 
are handled precisely and efficiently. 
MMU. The R4000 processor uses an on-chip TLB that 
provides rapid virtual-to-physical address translation of: 

- 2-GByte user virtual address space in 32-bit mode 

- 5 12-Gbyte user virtual address space in 64-bit mode. 
Cache Control. The R4000 primary instruction and data 
caches reside on-chip, and can each hold from 8 Kbytes to 
32 Kbytes. An off-chip secondary cache can hold from 128 
Kbytes to 4 MBytes. All R4000 cache control, including the 
secondary cache control, logic is on-chip. 

Floating Point Unit The FPU is located on-chip and 
implements the ANSI/IEEE standard 754-1985. 



CPU Registers 



The CPU provides thirty-two general-purpose registers, a Program 
Counter (PC), and two registers that hold the results of integer 
multiply and divide operations. These registers are either 32-bits or 
64-bits wide, depending on the mode of operation. Two general- 
purpose registers have special functions: 

• rO is hardwired to a value of zero. rO can be used as the 
target register for any instruction the results of which can 
be discarded. rO can also be used as a source when a zero 
value is needed. 

• r31 is the link register for JumpAndLink instructions. It 
should not be used explicitly by other instructions. 

The MIPS architecture defines three special registers whose use or 
modification is implicit with certain instructions. These special 
registers are: 

• PC Program Counter 

• HI Multiply and Divide Register higher result 

• LO Multiply and Divide Register lower result 

The two Multiply and Divide Registers (HI, LO) store the doubleword, 
64-bit result or quadword, 128-bit result of integer multiply operations 
and the quotient (in LO) and remainder (in HI) of integer divide 
operations. 
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Figure 1-2 shows the CPU Registers. 



General-Purpose Registers 



63 



31 



rO 



r1 



r2 



r29 



r30 



r31 



Multiply and Divide Registers 

63 31 



HI 



H 



63 



31 



LO 



Program Counter 

63 31 



PC 



Register width depends on mode of operation: 32-bit or 64-bit 



Figure 1-2 CPU Registers 

The R4000 has no Program Status Word (PSW) Register; its functions 
are provided by the Status and Cause Registers incorporated within 
Coprocessor (CPO). CPO registers are described later in this chapter. 

CPU Instruction Set Overview 

Each CPU instruction is thirty-two bits long. As shown in Figure 1-3, 
there are three instruction formats: immediate (I-type), jump (J-type), 
and register (R-type). Using only these three instruction formats 
simplifies instruction decoding, more complicated (and less 
frequently used) operations and addressing modes can be synthesized 
by the compiler using sequences of these simple instructions. 
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Figure 1-3 CPU Instruction Formats 



The instruction set can be divided into the following groups: 

• Load and Store instructions move data between memory 
and general registers. They are all I-type instructions, since 
the only addressing mode supported is base register plus 
16-bit, signed immediate offset. 

• Computational instructions perform arithmetic, logical, 
shift, multiply, and divide operations on values in registers. 
They occur in both R-type (both the operands and the 
result are stored in registers) and I-type (one operand is a 
16-bit immediate value) formats. 

• Jump and Branch instructions change the control flow of a 
program. Jumps are always to a paged, absolute address 
formed by combining a 26-bit target address with the high- 
order bits of the Program Counter (J-type format) or 
register addresses (R-type format). Branches have 16-bit 
offsets relative to the program counter (I-type). 
JumpAndLink instructions save a return address in register 
31. 

• Coprocessor instructions perform operations in the 
coprocessors. Coprocessor load and store instructions are I- 
type (see the FPU instructions in Chapter 5). 

• Coprocessor instructions perform operations on CPO 
registers to manipulate the memory management and 
exception handling facilities of the processor. Table 1-3 
shows these instructions. 

• Special instructions perform system calls and breakpoint 
operations. These instructions are always R-type. 
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• Exception instructions cause a branch to the general 
exception-handling vector based upon the result of a 
comparison. These instructions occur in both R-Type (both 
the operands and the result are registers) and I-type (one 
operand is a 16-bit immediate value) formats. 
A more detailed summary is provided in Chapter 2 and a complete 
description of each instruction is given in Appendix A. 
Table 1-1 lists the instruction set (ISA) common to all MIPS R-Series 
processors; Table 1-2 lists R4000 instructions that are extensions to the 
ISA These instructions result in code space reductions, 
multiprocessor support, and improved performance in operating 
system kernel code sequences and in situations where run-time 
bounds checking is frequently performed. 
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Table 1-1 CPU Instruction Set (ISA) 


OP 


Description 


OP 


Description 




Load and Store Instructions 




Multiply and Divide Instructions 


LB 


Load Byte 


MULT 


Multiply 


LBU 


Load Byte Unsigned 


MULTU 


Multiply Unsigned 


LH 


Load Halfword 


DIV 


Divide 


LHU 


Load Halfword Unsigned 


DIVU 


Divide Unsigned 


LW 


Load Word 


MFH1 


Move From HI 


LWL 


Load Word Left 


MTHI 


Move To HI 


LWR 


Load Word Right 


MFLO 


Move From LO 


SB 


Store Byte 


MTLO 


Move To LO 


SH 


Store Halfword 




Jump and Branch Instructions 


SW 


Store Word 


J 

JAL 


Jump 

Jump And Link 


SWL 


Store Word Left 


SWR 


Store Word Right 


JR 


Jump Register 




Arithmetic instructions 


JALR 


Jump And Link Register 




(ALU Immediate) 


BEQ 


Branch on Equal 


ADDI 


Add Immediate 


BNE 


Branch on Not Equal 


ADDIU 


Add Immediate Unsigned 


BLEZ 


Branch on Less than or Equal to Zero 


SLTI 


Set on Less Than Immediate 


BGTZ 


Branch on Greater Than Zero 


SLTIU 


Set on Less Than Immediate 
Unsigned 


BLTZ 


Branch on Less Than Zero 




BGEZ 


Branch on Greater than or 


ANDI 


AND Immediate 




Equal to Zero 


ORI 


OR Immediate 


BLTZAL 


Branch on Less Than Zero And Link 


XORI 


Exclusive OR Immediate 


BGEZAL 


Branch on Greater than or Equal to 


LUI 


Load Upper Immediate 




Zero And Link 




Arithmetic Instructions 




Coprocessor Instructions 




(3-operand, R-type) 


LWCZ 


Load Word to Coprocessor z 


ADD 


Add 


SWCZ 


Store Word from Coprocessor z 


ADDU 


Add Unsigned 


MTCZ 


Move To Coprocessor z 


SUB 


Subtract 


MFCz 


Move From Coprocessor z 


SUBU 


Subtract Unsigned 


CTCz 


Move Control to Coprocessor z 


SLT 
SLTU 


Set on Less Than 

Set on Less Than Unsigned 


CFCz 

COPz 


Move Control From Coprocessor z 
Coprocessor Operation z 


AND 


AND 

ah 


BCzT 


Branch on Coprocessor z True 


OR 
XOR 


OR 
Exclusive OR 


BCzF 


Branch on Coprocessor z False 


NOR 


NOR 




Special Instructions 




Shift Instructions 


SYSCALL 


System Call 


SLL 


Shift Left Logical 


BREAK 


Break 


SRL 


Shift Right Logical 






SRA 


Shift Right Arithmetic 






SLLV 


Shift Left Logical Variable 






SRLV 


Shift Right Logical Variable 






SRAV 


Shift Right Arithmetic Variable 
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Table 1-2 Extensions to the ISA 



OP 



LD 

LDL 

LDR 

LL 

LLP 

LWU 

SC 

SCD 

SD 

SDL 

SDR 

SYNC 



DADDI 
DADDIU 



Description 



OP 



DADD 
DADDU 
DSUB 
DSUBU 



DSLL 
DSRL 
DSRA 
DSLLV 

DSRLV 

DSRAV 

DSLL32 

DSRL32 

DSRA32 



Load and Store Instructions 

Load Doubleword 

Load Doubleword Left 

Load Doubleword Right 

Load Linked 

Load Linked Doubleword 

Load Word Unsigned 

Store Conditional 

Store Conditional Doubleword 

Store Doubleword 

Store Doubleword Left 

Store Doubleword Right 

Sync 

Arithmetic Instructions 

(ALU Immediate) 

Doubleword Add Immediate 

Doubleword Add Immediate 

Unsigned 

Arithmetic Instructions 
(3-operand, R-type) 

Doubleword Add 
Doubleword Add Unsigned 
Doubleword Subtract 
Doubleword Subtract Unsigned 

Shift Instructions 

Doubleword Shift Left Logical 
Doubleword Shift Right Logical 
Doubleword Shift Right Arithmetic 
Doubleword Shift Left 
Logical Variable 
Doubleword Shift Right 
Logical Variable 
Doubleword Shift Right 
Arithmetic Variable 

Doubleword Shift Left 
Logical + 32 
Doubleword Shift Right 
Logical + 32 

Doubleword Shift Right 
Arithmetic + 32 



Description 



DMULT 
DMULTU 
DDIV 
DDIVU 

BEQL 
BNEL 
BLEZL 

BGTZL 
BLTZL 
BGEZL 

BLTZALL 

BGEZALL 

BCzTL 
BCzFL 

TGE 

TGEU 

TLT 

TLTU 

TEQ 

TNE 

TGEI 

TGEIU 

TLTI 
TLTIU 
TEQI 
TNEI 



DMFCz 
DMTCz 
LDCz 
SDCz 



Multiply and Divide Instructions 

Doubleword Multiply 
Doubleword Multiply Unsigned 
Doubleword Divide 
Doubleword Divide Unsigned 

Jump and Branch Instructions 

Branch on Equal Likely 
Branch on Not Equal Likely 
Branch on Less than or Equal 
to Zero Likely 

Branch on Greater Than Zero Likely 

Branch on Less Than Zero Likely 

Branch on Greater than or 

Equal to Zero Likely 

Branch on Less Than Zero And 

Link Likely 

Branch on Greater than or Equal to 

Zero And Link Likely 

Branch on Coprocessor z True Likely 

Branch on Coprocessor z False Likely 

Exception Instructions 

Trap if Greater Than or Equal 

Trap if Greater Than or Equal Unsigned 

Trap if Less Than 

Trap if Less Than Unsigned 

Trap if Equal 

Trap if Not Equal 

Trap if Greater Than or Equal Immediate 

Trap if Greater Than or Equal 

Immediate Unsigned 

Trap if Less Than Immediate 

Trap if Less Than Immediate Unsigned 

Trap if Equal Immediate 

Trap if Not Equal Immediate 

Coprocessor Instructions 

Doubleword Move From Coprocessor z 
Doubleword Move To Coprocessor z 
Load Double Coprocessor z 
Store Double Coprocessor z 
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Table 1-3 CPO Instructions 



Op 


Description 


DMFCO 


Doubleword Move From CPO 


DMTCO 


Doubieword Move To CPO 


MTCO 


Move to CPO 


MFCO 


Move from CPO 


TLBR 


Read Indexed TLB Entry 


TLBWI 


Write Indexed TLB Entry 


TLBWR 


Write Random TLB Entry 


TLBP 


Probe TLB for Matching Entry 


ERET 


Exception Return 



Data Formats and Addressing 



Lower 
Address 



The R4000 uses four data formats: a 64-bit doubleword, a 32-bit word, 

a 16-bit halfword and an 8-bit byte. The byte ordering is configurable 

as either Big-endian or Little-endian format Endianness refers to the 

location of byte within a multi-byte structure. 

Figure 1-4 and Figure 1-5 show the ordering of bytes within words 

and the ordering of words within multiple-word structures for the 

Big-endian and Little-endian conventions. 

When the R4000 is configured as a Big-endian system, byte is the 

most-significant (leftmost) byte, thereby providing compatibility with 

MG 68000® and IBM 370® conventions. This configuration is shown in 

Figure 1-4. 



Higher 
Address 31 



24 23 



Big Endian 

16 15 



8 7 



8 


9 


10 


11 


4 


5 


6 


7 





1 


2 


3 



Word 
Address 

8 

4 




Most-significant byte is at lowest address. 

Word is addressed by byte address of most-significant byte 



Figure 1 -4 Addresses of Bytes within Words: Big-endian Byte Alignment 
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Higher 
Address 



U 
Lower 
Address 



Higher 
Address 



When configured as a Little-endian system, byte is always the least- 
significant (rightmost) byte, which is compatible with iAPX® x86 and 
DEC VAX® conventions. This configuration is shown in Figure 1-5. 



31 



24 23 



Little Endian 
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Word 
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• Least-significant byte is at lowest address. 

• Word is addressed by byte address of least-significant byte 



Figure 1 -5 Addresses of Bytes within Words: Little-endian Byte Alignment 

In this book, bit is always the least-significant (rightmost) bit; thus, 
bit designations are always Little Endian (although no instructions 
explicitly designate bit positions within words). 
Figure 1-6 and Figure 1-7 show byte alignment in doublewords. 
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Address 
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9 


10 


11 
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13 
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4 
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Doubleword 
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Most-significant byte is at lowest address. 

Word is addressed by byte address of most-significant byte 



Figure 1-6 Addresses of Bytes within Doublewords: Big-endian Byte Alignment 
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Higher 
Address 



Little Endian 



63 



Byte# 



U 
Lower 
Address 



23 


22 


21 


20 


19 


18 


17 


16 


15 


14 


13 


12 


11 


10 


9 


8 


7 


6 


5 


4 


3 


2 


1 






Doubleword 
o Address 
16 
8 




Least-significant byte is at lowest address. 

Word is addressed by byte address of least-significant byte 



Figure 1-7 Addresses of Bytes within Doublewords: Little-endian Byte Alignment 

The CPU uses byte addressing for halfword, word, and doubleword 
accesses with the following alignment constraints: 

• Halfword accesses must be aligned on an even byte 
boundary (0, 2, 4...) 

• Word accesses must be aligned on a byte boundary 
divisible by four (0, 4, 8...) 

• Doubleword accesses must be aligned on a byte boundary 
divisible by eight (0, 8, 16...). 

As shown in Figure 1-6 and Figure 1-7, the address of a multiple-byte 
data item is the address of the most-significant byte on a Big-endian 
configuration, or the address of the least-significant byte on a Little- 
endian configuration. 

Special instructions are provided for loading and storing words and 
doublewords that are not aligned on 4-byte (word) or 8-word (double 
word) boundaries: LWL, LWR, SWL, SWR, LDL, LDR, SDL, SDR. 
These instructions are used in pairs to provide addressing of 
misaligned words with one additional instruction cycle over that 
required for aligned words. For each of the two endianness 
conventions, Figure 1-8 shows the bytes that are accessed when 
addressing a misaligned word with byte address 3. 
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Figure 1-8 Example Misaligned Word: Byte Address $3 



System Control Coprocessor (CPO) 



The MIPS ISA allows up to four coprocessors (designated CPO 
through CP3). Coprocessor 1 (CP1) is reserved for the on-chip, 
floating-point coprocessor. Coprocessor 2 (CP2) is reserved for future 
definition by MIPS, and the encoding for Coprocessor 3 (CP3) is used 
to provide certain extensions to the MIPS ISA. Coprocessor (CPO) is 
also incorporated on the CPU chip and supports the virtual memory 
system and exception handling. The virtual memory system is 
implemented with an on-chip TLB and a group of programmable 
registers, as described in Table 1-4. 

CPO translates virtual addresses into physical addresses and manages 
exceptions and transitions between kernel, supervisor, and user 
states. It also controls the cache subsystem and provides diagnostic 
control and error recovery facilities. The R4000 also provides a generic 
system timer for interval timing, timekeeping, process accounting, 
and time-slicing (see the Count and Compare Registers in Chapter 5). 
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Register Name 
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Register Name 
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Figure 1-9 The R4000 CPO Registers 



The CPO registers shown in Figure 1-9 and described in Table 1-4 
manipulate the memory management and exception handling 
capabilities of the CPU. Refer to Chapter 4 for a detailed description of 
the registers associated with the virtual memory system and to 
Chapter 5 for descriptions of the exception processing registers. 
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Table 1 -4 System. Control Coprocessor (CPO) Registers 



Number 



o 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 . 

14 

15 

16 

17 

18 

19 

20 

21-25 

26 

27 

28 

29 

30 

31 



Register 



Index 

Random 

EntryLoO 

EntryLol 

Context 

PageMask 

Wired 

BadVAddr 

Count 

EntryHi 

Compare 

SR 

Cause 

EPC 

PRId 

Config 

LLAddr 

WatchLo 

WatchHi 

XContext 

ECC 

CacheErr 

TagLo 

TagHi 

ErrorEPC 



Description 



Programmable pointer into TLB array 

Pseudorandom pointer into TLB array (read only) 

Low half of TLB entry for even VPN 

Low half of TLB entry for odd VPN 

Pointer to kernel virtual PTE table in 32-bit addressing mode 

TLB Page Mask 

Number of wired TLB entries 

Reserved 

Bad virtual address 

Timer Count 

High half of TLB entry 

Timer Compare 

Status Register 

Cause of last exception 

Exception Program Counter 

Processor Revision Identifier 

Configuration Register 

Load Linked Address 

Memory reference trap address low bits 

Memory reference trap address high bits 

Pointer to kernel virtual PTE table in 64-bit addressing mode 

Reserved 

Secondary-cache ECC and Primary Parity 

Cache Error and Status Register 

Cache Tag Register 

Cache Tag Register 

Error Exception Program Counter 

Reserved 



Floating-Point Unit (FPU) 



The MIPS Floating-Point Unit (FPU) operates as a coprocessor for the 
CPU and extends the CPU instruction set to perform arithmetic 
operations on values in floating-point representations. The FPU, with 
associated system software, fully conforms to the requirements of 
ANSI/IEEE Standard 754-1985, "IEEE Standard for Binary Floating- 
Point Arithmetic." 
The FPU features: 
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• Full 64-bit operation. The FPU contains 16 64-bit registers 
or, optionally, thirty-two 64-bit registers that hold single- 
precision or double-precision values. The 16 additional 
floating-point registers are enabled by setting the FR bit in 
the Status register. The FPU also includes a 32-bit Status! 
Control Register that provides access to all IEEE-Standard 
exception handling capabilities. 

• Load and Store Instruction Set Like the CPU, the FPU 
uses a load- and store-oriented instruction set Floating- 
point operations are started in a single cycle and their 
execution is overlapped with other fixed-point or floating- 
point operations. 

• Tightly coupled Coprocessor Interface. The FPU is On- 
chip and appears to the programmer as an extension of the 
CPU (the FPU is accessed as Coprocessor 1). This forms a 
tightly coupled unit with a seamless integration of floating- 
point and fixed-point instruction sets. Since each unit 
receives and executes instructions in parallel, some 
floating-point instructions can execute at the same rate (2 
instructions per cycle) as fixed-point-instructions. The FPU 
instructions are summarized in Chapter 6, Floating-Point 
Unit, 

On-chip Caches 

The R4000 incorporates on-chip instruction and data caches to keep 
the high-performance pipeline full. Each cache has its own 64-bit data 
path that can be accessed in parallel. The caches can be accessed twice 
in one cycle. Combining this feature with a pipelined, single-cycle 
access of each cache, the cache subsystem provides the integer and 
floating-point units with an aggregate bandwidth of 1.6 GBytes per 
second at a Master Clock frequency of 50 MHz. The R4000 caches are 
described in detail in Chapter 11, Cache Organization, Operation, and 
Coherency. 
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Memory Management System 

The R4000 has a physical addressing range of 64 Gbytes (36 bits). 
However, since most systems implement a physical memory smaller 
than 4 Gbytes, the CPU provides a logical expansion of memory space 
by translating addresses composed in a large virtual address space 
into available physical memory addresses. In 32-bit mode, the virtual 
address space is divided into 2 Gbytes per user process and 2 Gbytes 
for the kernel. In 64-bit mode, the virtual address is expanded to allow 
512 Gbytes of user virtual address space. 

The Translation Lookaside Buffer (TLB) 

Virtual memory mapping is assisted by a TLB. This TLB caches virtual 
address translations. The fully-associative, on-chip TLB contains 48 
entries, and each of these entries maps a pair of variable-sized pages 
(page size varies from 4 KBytes to 16 MBytes, increasing by multiples 
of 4). An address translation value is tagged with the most-significant 
bits of its virtual address (the number of these bits depends upon the 
size of me page) and a per-process identifier. If there is no matching 

entry in the TLB, an exception is taken and software refills the on-chip 

TLB from a Page Table resident in memory. An entry, chosen at 

random, is replaced to make way for the new one. This TLB is referred 

toastheJTLB. 

The R4000 also has a two-entry instruction TLB (TTLB) to assist in 

instruction address translation. The ITLB is completely invisible to 

software and is present for performance reasons only. 

Operating Modes 

The R4000 CPU has three operating modes: User mode, Kernel mode, 
and Supervisor mode. The CPU normally operates in User mode until 
an exception is detected forcing it into Kernel mode. It remains in 
Kernel mode until an Exception Return (ERET) instruction is executed. 
The Supervisor mode can be used to design secure operating systems. 
The manner in which memory addresses are translated or mapped 
depends on the operating mode of the CPU. Chapter 4 describes the 
MMU and Operating modes in greater detail. 
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R4000 Superpipeline Architecture 

The R4000 exploits instruction-level parallelism using a 

superpipelined implementation. The R4000 uses an 8-stage 

superpipeline which places no restrictions on the instruction issued. 

Under normal circumstances, any two instructions are issued each 

cycle. 

The internal pipeline of the R4000 operates at twice the frequency of 

the master clock. This is shown in Figure 1-10. The 8-stage 

superpipeline of the R4000 achieves high throughput by pipelining 

cache accesses, shortening register access times, implementing virtual 

indexed primary caches, and allowing the latency of functional units 

to span multiple pipeline clock cycles (pcydes). In the rest of this 

document, the internal pipeline clock and clock cycles are often 

referred to as pclock and pcycles respectively. The R4000 

superpipeline is covered in greater detail in Chapter 3. 

The execution of a single R4000 CPU instruction consists of the 

following eight primary steps: 

IF Instruction fetch First half . Virtual address is presented to 

the I-cache and TLB. 
IS Instruction fetch Second half. The I-cache outputs the 

instruction and the TLB generates the physical address. 
RF Register File. Three activities occur in parallel: 

• instruction is decoded and a check is made for 
interlock conditions, 

• instruction tag check is made to determine if there is a 
cache hit or not, 

• operands are fetched from the register file. 

EX Instruction EXecute. One of three activities can occur: 

• if the instruction is a register-to-register operation, an 
arithmetic, logical, shift, multiply, or divide operation 
is performed; 

• if the instruction is a load and store, the data virtual 
address is calculated; 
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• if the instruction is a branch, the branch target virtual 
address is calculated and branch conditions are 
checked. 
DF Data cache First half. A virtual address is presented to the 

D-cache and TLB. 
DS Data cache Second half. The D-cache outputs the 

instruction and the TLB generates the physical address. 
TC Tag Check. A tag check is performed for loads and stores to 

determine if there is a hit or not 
WB Write Back. The instruction result is written back to the 

register file. 
The R4000 uses an 8-stage pipeline; thus, execution of 8 instructions 
at a time are overlapped, as shown in Figure 1-10. 
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Figure 1-10 R4000 Pipeline and Instruction Overlapping 



R4000 User's Manual-Preliminary 



1-23 



Chapter 1 

Cache Memory Hierarchy 

To achieve its high performance in uniprocessor and multiprocessor 
systems, the R4000 supports a cache memory hierarchy that increases 
memory access bandwidth and reduces the latency of load and store 
instructions. The two-level cache memory hierarchy consists of on- 
chip instruction and data caches, and an optional external secondary 
cache that can vary in size from 128 Kbytes to 4 Mbytes. 
The secondary cache is assumed to consist of one bank of industry- 
standard static RAM (SRAM) with output enables. The secondary 
cache consists of a quadword (128 bit) wide data array and a 25-bit 
wide tag array. Check fields are added to both the data and tag arrays 
to improve data integrity. The secondary cache may be configured as 
either a joint cache or split instruction/data cache. The maximum 
secondary cache size is 4 MBytes and the minimum secondary cache 
size is 128 KBytes for a joint cache and 256 KBytes for split instruction/ 
data cache. The secondary cache is direct-mapped, and is addressed 
with the lower part of the physical address. 
A detailed description of the cache hierarchy is given in Chapter 11, 
Cache Organization, Operation, and Coherency. 

Secondary Cache Interface 

The R4000SC and R4000MC versions of the R4000 interface to an 
optional secondary cache. The R4000 provides all of the secondary 
cache control circuitry, including ECC protection, on chip. The 
secondary cache interface consists of a 128-bit data bus, a 25-bit tag 
bus, an 18-bit address bus and SRAM control signals. The 128-bit wide 
data bus minimizes cache miss penalty, and allows the use of standard 
low-cost SRAMs in the secondary cache design. 

System Interface 

The R4000 supports a 64-bit system interface that can be used to 
construct uniprocessor systems with a direct DRAM interface with or 
without a secondary cache or cache-coherent multiprocessor systems. 
The interface consists of a 64-bit multiplexed address and data bus 
with 8 check bits and a 9-bit parity-protected command bus. In 
addition, there are 8 handshake signals. The interface has a simple 
timing specification and is capable of transferring data between the 
processor and memory at a peak rate of 400 Mbytes/second at 50 
MHz. 
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R4000 Configurations 

The R4000 is packaged in three different configurations. All 
processors are implemented in sub-1 micron CMOS technology: 

• The R4000SC is designed for use in high-performance 
uniprocessor systems. It is packaged in a 447-pin LGA/ 
PGA and includes integrated control for large secondary 
caches built from standard SRAMs. 

• The R4000MC is designed for use in large cache-coherent 
multiprocessor systems. The R4000MC is also packaged in 
447-pin LGA/PGA and includes, in addition, support for a 
wide variety of bus designs and cache-coherency 
mechanisms. 

• The R4000PC is designed for cost-sensitive systems such as 
inexpensive desktop systems and high-end embedded 
controllers. It is packaged in a 179-pin PGA. The R4000PC 
does not support a secondary cache. 



Compatibility 



The R4000 provides complete application software compatibility with 
the MIPS R2000, R3000, and R6000 processors. Although the 
architecture has evolved in response to a compromise between 
software and hardware resources in the computer system, this 
evolution maintains object-code compatibility for programs that 
execute in User mode (see Chapter 4, Memory Management System, for 
a description of operating modes). Like its predecessors, the R4000 
implements the MIPS Instruction Set Architecture (ISA) for user-mode 
programs; mis guarantees that user-mode programs conforming to 
the ISA will execute on any MIPS hardware implementation. 
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CPU Instruction Set Summary 



This chapter provides an overview of the CPU instruction set by 
summarizing each instruction category in a table. Refer to Appendix 
A for individual descriptions of each CPU instruction. 
The FPU instructions are summarized in Chapter 6, and are described 
in detail in Appendix B. 



Instruction Formats 



Each CPU instruction consists of a single word (32 bits) aligned on a 
word boundary. There are three instruction formats, as shown in 
Figure 2-1. The use of these three instruction formats simplifies 
instruction decoding since the compiler can synthesize more 
complicated (and less frequently used) operations and addressing 
modes. In the MIPS architecture, coprocessor instructions are 
implementation-dependent; see Appendix A for R4000 Coprocessor 
instruction details. 
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-Type (Immediate) 

31 26 25 21 20 



16 15 



OP 



rs 



immediate | 



J-Type (Jump) 

31 26 25 



op 



target 



R-Type (Register) 

31 26 25 21 20 



16 15 1110 6 5 



OP 



rs 



rt 



rd 



sa 



funct 



op 



is a 6-bit operation code 



rs 



immediate 



target 



rd 



sa 



is a 5-bit source register specifier 



is a 5-bit target (source/destination) 
register or branch condition 



is a 16-bit immediate value, branch dis- 
placement or address displacement 



is a 26-bit jump target address 



is a 5-bit destination register specifier 



iunct 



is a 5-bit shift amount 



is a 6-bit function field 



Figure 2-1 CPU Instruction Formats 

Load and Store Instructions 



Load and Store instructions move data between memory and the 
general registers. They are all immediate (I-type) instructions. The 
only addressing mode that load and store instructions directly 
support is base register flits 16-bit signed immediate offset. 
The instruction immediately following a load can use the contents of 
the loaded register. In such cases, hardware interlocks require 
additional real cycles; consequently, scheduling load delay slots is still 
desirable, for both performance and R3000 compatibility. However, 
the scheduling of load delay slots is not absolutely required for 
functional code. 
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The load and store instruction opcode determines the access type 
which indicates the size of the data item to be loaded or stored as 
shown in Figure 2-2. Regardless of access type or byte-numbering 
order (endianness), the address specifies the byte with the smallest 
byte address in the addressed field. For a Big-endian configuration, it 
is the most-significant byte; for a Little-endian configuration, it is the 
least-significant byte. 

The bytes that are used within the addressed doubleword can be 
determined from the access type and the three low-order bits of the 
address, as shown in Figure 2-2 Only the combinations shown in 
Figure 2-2 are permissible; other combinations cause address error 
exceptions. Table 2-1 lists the load and store instructions defined by 
the ISA. Table 2-2 lists the instructions which are extensions to the ISA. 
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Figure 2-2 Byte Specifications for Load and Store Instructions 
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Table 2-1 Load and Store Instruction Summary (ISA) 



Instruction 



Load Byte 



Load Byte 
Unsigned 



Load 
Halfword 



Load 

Halfword 

Unsigned 



Format and Description 



op 



offset 



LB rt,offset(base) 

Sign-extend 16-bit offset and add to contents of register base to form address. 

Sign-extend contents of addressed byte and load into register rt 



LBU rt,offset(base) 

Sign-extend 16-bit offset and add to contents of register base to form address. 

Zero-extend contents of addressed byte and load into register rt 



LH rt,offset(base) 

Sign-extend 16-bit offset and add to contents of register base to form address. 

Sign-extend contents of addressed halfword and load into register rt. 



Load Word 



Load Word 
Left 



Load Word 
Right 



LHU rt,offset(base) 

Sign-extend 16-bit offeef and add to contents of register base to form address. 

Zero-extend contents of addressed halfword and load into register/! 



LW rt,offset(base) 

Sign-extend 16-bit offset and add to contents of register base to form address. 

Load contents of addressed word into register rt (sign extended if 64-bit mode) 



LWL rt,offset(base) 

Sign-extend 1 6-bit offset and add to contents of register base to form address. 
SWft addressed word left so that addressed byte is leftmost byte of a word. 
Merge bytes from memory with contents of register rt and load the result into 
register rt. (sign extended if 64-bit mode) ___^_ 



Store Byte 



Store 
Halfword 



Store Word 



LWR rt,offset(base) 

Sign-extend 16-bit o/feef and add to contents of register base to form address. 
Shift addressed word right so that addressed byte is rightmost byte of a word. 
Merge bytes from memory with contents of register rt and load the result into 
register rt (sign extended if 64-bit mode) 



SB rt,offset(base) 

Sign-extend 16-bit offset and add to contents of register base to form address. 

Store the least-significant byte of register rt at addressed location. 



SH rt r offset(base) . 
Sign-extend 16-bit offset and add to contents of register base to form address. 
Store the least-significant halfword of register rt at addressed location. 



Store Word 
Left 



Store Word 
Right 



SW rt,offset(base) 

Sign-extend 16-bit offset and add to contents of register ^e to form address 

Store the contents of the least significant word of register rt at addressed location. 

SWL rt,offset(base) 

Sign-extend 16-bit offset and add to contents of register base to form address. 
Shift Contents of register rt left so that the leftmost byte of the low-order word is 
in the position of the addressed byte. Store the bytes containing the original data 
in the low-order word into corresponding bytes at addressed byte. 



SWR rt,offset(base) 

Sign-extend 16-bit offset and add to contents of register base to form address 
Shift contents of register rt right so that the rightmost byte of the low-order word 
is in theposition of the addressed byte. Store the bytes containing the original 
data in the low-order word into corresponding bytes at addressed byte. 
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Table 2-2 Load and Store Instruction (ISA Extensions) 



Instruction 



Load 
Doubleword 



Load 

Doubleword 

Left 



Load 

Doubleword 

Right 



Load Linked 



Load Linked 
Doubleword 



Load Word 
Unsigned 



Store 
Doubleword 



Store 
Conditional 



Sfore 

Conditional 

Doubleword 



Store 

Doubleword 

Left 



Store 

Doubleword 

Right 



op 



offset 



Format and Description 

LDit,uffeel(Lcbk!) — : 

Sign-extend 16-bit offeef and add to contents of register base to form address 
Load contents of addressed double word into register it . 



Sync 



LDL rt, otiset(base) 

Sign-extend 16-bit offset and add to contents of register base to form address. 
Shift addressed doubleword left so that addressed byte is leftmost byte of a 
doubleword. Merge bytes from memory with contents of register rt and load the 
result into register rt _^_^_ 



LDRrloffsetfbase) 

Sign-extend 16-bit offset and add to contents of register base to form address. 
Shift addressed doubleword right so that addressed byte is rightmost byte of a 
doubleword. Merge bytes from memory with contents of register rt and load the 
result into register rt. . 



LL rtoffset(base) 

Sign-extend 16-bit offset and add to contents of register base to form address. 

Sign-extend contents of addressed word and load into register rt 



LLD rt,offset(base) 

Sign-extend 1 6-bit offset and add to contents of register base to form address. 

Load contents of addressed doubleword into register it 



LWU rt,offset(base) 

Sign-extend 16-bit offset and add to contents of register base to form address. 

Zero extend contents of addressed word and load into register rt 



au n,onsei(Dasej 

Sign-extend 16-bit offisef and add to contents of register base to form address. 

Store contents of register rt at addressed location. 



So tl,UiK>*si\ueats/ 

Sign-extend 16-bit offset and add to contents of register base to form address. 
Conditionally store low-orde r word of register rt at addressed location. 

SCDrt,ottset(base) " 

Sign-extend 16-bit offset and add to contents of register teseto form address. 

Conditionally store contents of register rt at addressed location. 



SDL rt,offset(base) 

Sign-extend 16-bit offset and add to contents of register base to form address. 
Shift contents of register rt left so that the leftmost byte of the word is in the posi- 
tion of the addressed byte. Store the bytes containing the original data in the low- 
order doubleword into corresponding bytes at the addressed byte. 



SDR rt,offset(base) 

Sign-extend 16-bit offset and add to contents of register base to form address. 
Shift contents of register rt right so that the rightmost byte of the word is in the 
position of the addressed byte. Store the bytes containing the original data in the 
low-order doubleword into corresponding bytes at the addressed byte. 



SYNC 

Complete all outstanding load or store instructions before allowing any new load 

and store instruction to start. . — — 
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Computational Instructions 

Computational instructions perfonn arithmetic, logical, shift, 
multiply, and divide operations on values in registers. They occur in 
both register (R-type) format, in which both operands are registers, 
and immediate (I-type) format, in which one operand is a 16-bit 
immediate. There are four categories of computational instructions: 

• ALU Immediate instructions 

• Three-Operand Register-Type instructions 

• Shift instructions 

• Multiply and Divide instructions 

When operating in 64-bitmode, 32-bit operands must be correctly sign 
extended. The result of operations which use incorrectly sign-extend- 
ed, 32-bit values is unpredictable. 
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Table 2-3 ALU Immediate Instruction Summary 



Instruction 



ADD Immediate 



ADD Immediate 
Unsigned 



Set on Less Than 
Immediate 



Format and Description op rs 



immediate 



ADDI rt,rs,immediate 

Add 1 6-bit sign-extended immediate to register rs and place the 32-bit result 

(sign-extended in 64-bit mode) in register rt Trap on 2's-complement overflow 



ADDIU rt,rs,immediate 

Add 16-bit sign-extended immediate to register rs and place the 32-bit result 

(sign-extended in 64-bit mode) in register rt. Do not trap on overflow. 



SLTI rt,rs,immediate 

Compare 16-bit sign-extended immediate with register rs as signed 
integers. Result is set to 1 if rs is less than immediate, otherwise result is set 
to 0. Place result in register rt . 



Set on Less Than 

Immediate 

Unsigned 



AND Immediate 



OR Immediate 



Exclusive OR 
Immediate 



Load Upper 
Immediate 



SLTIU rt,rs,immediate 

Compare 1 6-bit sign-extended immediate with register rs as unsigned 
integers. Result is set to 1 if rs is less than immediate; otherwise result is set 
to 0. Place result in register/! . 



ANDI rt,rs,immediate 

Zero-extend 16-bit immediate, AND with contents of register rs and place 

the result in register rt 



ORI rt,rs,immediate 

Zero-extend 16-bit immediate, OR with contents of register rs and place 

the result in register rt 



XORI rt,rs,immediate 

Zero-extend 16-bit immediate, exclusive OR with contents of register rs and 

place the result in register rt ■ 



LUI rt,immediate 

Shift 16-bit immediate left 16 bits. Set least-significant 1 6 bits of word to 

zeros. Store the result in register rt 
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Table 2-4 ALU Immediate Instruction (ISA Extensions) 



Instruction 


Format and Description op rs rt immediate 


DADD Immediate 


DADDI rt,rs,immediate 

Add 16-bit sigrvextended immediate to register rs and place the 64-bit result 

in register rt Trap on 2's-complement overflow. 


DADD Immediate 
Unsigned 


DADDIU rt,rs,immediate 

Add 16-bit sign-extended immediate to register rs and place the 64-bit result 

in register rt Do not trap on overflow. 
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Table 2-5 Three-Operand Register-Type Instruction Summary 



Instruction 



Add 



Add Unsigned 



Format and Description 



op 



rs 



rd 



function 



ADD rd,rs,rt 

Add contents of registers rs and rt and place the 32-bit result (sign-extended 

in 64-bit mode) in register rd. Trap on 2's-complement overflow. 



ADDU rd,rs,rt 

Add contents of registers rs and rt and place the 32-bit result (sign-extended 

in 64-bit mode) in register rd. Do not trap on overflow. - 



Subtract 



Subtract 
Unsigned 



Set on Less Than 



SUB rd,rs,rt 

Subtract contents of registers rtfrom rs and place the 32-bH ^esutt 

(sign-extended in 64-bit mode) in register rd. Trap on 2's-complement overflow 



SUBU rd,rs,rt 

Subtract contents of registers rtfrom rs and place the 32-bit result 

(sign-extended in 64-bit mode) in register rd. Do not trap on overflow. 



SLT rd,rs,rt 

Compare contents of register rt to register rs as signed integers. Result is set 

to 1 if rs is less than rt, otherwise result is set to O.PIace result in register rd. 



Set on Less Than 
Unsigned 



AND 



OR 



Exclusive OR 



NOR 



SLTU rd,rs,rt 

Compare contents of register rtto register rs as unsigned integers. Result is 
set to 1 if rs is less than rt, otherwise result is set to O.PIace result in regis- 
ter rd. . 



AND rd,rs,rt 

Bitwise AND the contents of registers rs and rt, and place the result in 

register rd. 



OR rd,rs,rt 

Bitwise OR the contents of registers rs and rt, and place the result in 

register rd. ___ — 



XOR rd,rs,rt 

Bitwise exclusive OR the contents of registers rs and rt, and place the 

result in register rd. 



NOR rd,rs,rt 

Bitwise NOR the contents of registers rs and rt, and place the result in 

register rd. 
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Instruction 



Doubleword 
Add 



Doubleword 
Add Unsigned 



Doubleword 
Subtract 



Doubleword 

Subtract 

Unsigned 
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Table 2-6 Three-Operand Register-Type Instruction (ISA Extensions) 



Format and Description 



op 



rs 



rd 



function 



DADD rd,rs,rt 

Add contents of registers rs and rt and place the 64-bit result in register rd. 

Trap on 2's-complement overflow. . 



DADDU rd,rs,rt 

Add contents of registers rs and rt and place the 64-brt result in register rd. 

Do not trap on overflow. 



DSUBrd,rs,rt 

Subtract contents of registers rtfrom rs and place the 64-bit result in register 

rd. Trap on 2's-complement overflow. 



DSUBU rd r rsjt 

Subtract contents of registers rtfrom rs and place the 64-bit result in register 

rd. Do not trap on overflow. ; 



R4000 User's Manual-Preliminary 



2-11 



Chapter 2 



Table 2-7 Shift Instruction Summary 



Instruction 



Shift Left 
Logical 

Shift Right 
Logical 



Format and Description 



op 



re 



rd 



function 



Shift Right 
Arithmetic 



Shift Left 

Logical 

Variable 



Shift Right 

Logical 

Variable 



Shift Right 
Arithmetic 
Variable 



SLL rd,i%sa 

Shift the contents of register rt left by sa bitsjind insert ^zerps lintc .the low- 
order bits. Place the 32-bit result in register ret (sign-extended in 64-brt mode) 



SRL rd,rt,sa 

Shift the contents of register rt right by sa bits, and insert zeros .mtothe high- 
order bits. Place the 32-bit result in register rd. (sign-extended in 64-brt mode) 



SRA rd,rt,sa 

Shift the contents of register it right by sa bits, and sign-extend the high- ^ 

order bits. Place the 32-bit result in register rd. (sign-extended in 64-bit mode) 



SLLV rd,rt,rs 

Shift the contents of register rt left. The low-order 5 bits of register rs specify 
the number of bits to shift left; insert zeros into thejow^rder bits of rt and 
place the 32-bit result in register rd. (sign-extended in 64-bit mode) 



SRLV rdjt,rs 

Shift the contents of register it right. The low-order 5 bits of register re specify 
the number of bits to shift right; insert zeros into the high-order bits ot rt and 
place the 32-bit result in register rd. (sign-extended in 64-brt mode) 



SRAV rd,rt,rs 

Shift the contents of register rt right. The low-order 5 bite of register re specify 
the number of bits to shift right; sign-extend the high-order bits of rt and 
place the 32-bit result in register rd. (sign-extended in 64-brt mode) 
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Table 2-8 Shift Instruction (ISA Extensions) 



Instruction 



Doubleword 
Shift Left 
Logical 



Format and Description 



op 



rs 



ra- 



sa 



function 



Doubleword 
Shift Right 
Logical 



Doubleword 
Shift Right 
Arithmetic 



Doubleword 
Shift Left 
Logical 
Variable 



DSLL rd,rt,sa 

Shift the contents of register rt left by sa bits, and insert zeros into the low-order 

bits. Place the 64-bit result in register rd. 



DSRL rd,rt,sa 

Shift the contents of register rt right by sa bits, and insert zeros into the high-order 

bits. Place the 64-bit result in register rd. _____ 



DSRA rd,rt,sa 

Shift the contents of register rtright by sa bits, and sign-extend the high-order bits. 

Place the 64-bit result in register rd. 



Doubleword 
Shift Right 
Logical 
Variable 



Doubleword 
Shift Right 
Arithmetic 
Variable 



Doubleword 
Shift Left 
Logical+32 



DSLLV rd,i%rs 

Shift the contents of register rt left. The low-order 6 bite ol [register rs specify the 
number of bits to shift left; insert zeros into the low-order bits of rt and place the 
64-bit result in register rd. 



DSRLV rdArs 

Shift the contents of register rt right The low-order 6 bite of i register re specify the 
number of bite to shift right; insert zeros into the high-order bits of rt and place the 
64-bit result in register rd. 



DSRAV rd,rt,rs 

Shift the contents of register rt right. The low-order 6 bite .of register rs specify the 
number of bits to shift right; sign-extend the high-order bite of rf and place the 64- 
bit result in register rd. 



Doubleword 
Shift Right 
Logical+32 



Doubleword 
Shift Right 
Arithmetic+32 



DSLL32 rd,r%sa 

Shift the contents of register rt left by 32+sa bite, and insert zeros into the low- 
order bite. Place the 64-bit result in register r d. 



DSfl_.__! /_/ ft S3 

Shift the contents of register rt right by 32+sa bits, and insert zeros into the hig- 
horder bits. Place the 64-bit result in register rd. 



DSRA32 rd rt sa 

Shift the contents of register rtright by 32+sa bite, and sign-extend the high-order 

bits. Place the 64-bit result in register rd. 
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Table 2-9 Multiply and Divide Instruction Summary 



Instruction 



Multiply 



Multiply Unsigned 



Format and Description op 



rs 



rd 



sa 



function 



MULT rs,rt 

Multiply the contents of registers is and rt as 2's-complement values. 
Place the 64-bit result in special registers H/and LO. (sign-extended in 
64-bit mode) 



Divide 



Divide Unsigned 



Move From HI 



Move From LO 



Move To HI 



Move To LO 



MULTU rs,rt 

Multiply the contents of registers rs and rt as unsigned integers Place the 

64-bft result in special registers H/and LO. (sign-extended in 64-bit mode) 



DIV rs,rt 

Divide the contents of register rs by rt, treating operands as 2's- 
complement values. Place the 32-bit quotient in special register LO and 
the 32-bit remainder in HI. (sign-extended in 64-bit mode) 



DIVU rs,rt 

Divide the contents of register rs by rt, treating operands as unsigned 
values. Place the 32-bit quotient in special register LO and the 32-Dit 
remainder in HI. (sign-extended in 64-bit mode) 



MFHI rd 

Move the contents of special register H/to register rd. 



MFLOrd 

Move the contents of special register LOto register rd. 



MTHIrd 

Move the contents of register rd to special register HI. 



MTLOrd 

Move the contents of register rd to special register LO. 
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Table 2-10 Multiply and Divide Instruction (ISA Extensions) 



Instruction 



Doubleword 
Multiply 



Doubleword 
Multiply Unsigned 



Doubleword 
Divide 



Format and Description 



op 



rs 



rd 



function 



DMULT rs,rt 

Multiply the contents of registers rs and rt as 2's-complement values. 

Place the 128-bit result in special registers HI and LP. 



DMULTU rs,rt 

Multiply the contents of registers re and rt as unsigned integers. 

Place the 128-bit result in special registers HI and LO. 



Doubleword 
Divide Unsigned 



DDIV rs,rt 

Divide the contents of register re by rt, treating operands as 
2's-complement values. Place the 64-bit quotient in special register LO 
and the 64-bit remainder in HI. ^ 



DDIVU rs,rt 

Divide the contents of register re by rt, treating operands a* unsigned 
values. Place the 64-bit quotient in special register LO and the 64-Dit 
remainder in HI. 



The number of cycles required for multiply and divide operations is 
shown in Table 2-11. The MFHI and MFLO instructions are 
interlocked so that any attempt to read them before prior operations 
have completed will cause execution of these instructions to be 
delayed until the operation finishes. Table 2-11 gives the number of 
pcydes required between a MULT, MULTU, DIV, DIVU,DMULT, 
DMULTU, DDIV or DDIVU operation, and a subsequent MFHI or 
MFLO operation, to resolve an interlock or stall. 
Table 2-11 Multiply/Divide Instruction Cycle Timing 



MULT MULTU 



DIV 



PCycles Required 
DIVU DMULT DMULTU 



DDIV 



DDIVU 



10 



10 



69 



69 



20 



20 



133 



133 
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Jump and Branch Instructions 



Instruction 



Jump 



Jump and branch instructions change the control flow of a program. 

All jump and branch instructions occur with an architectural delay of 

one instruction: that is, the instruction immediately following the 

jump or branch (the instruction in the delay slot) is always executed 

while the target instruction is being fetched from storage. (Taken 

branches have a 3 cycle penalty in this implementation. Refer to 

Chapter 3:The R4000 Pipeline for details.). 

Subroutine calls in high-level languages are usually implemented 

with Jump or JumpAndLink instructions. Both are J-type instructions. 

In this format, the 26-bit target address is shifted left two bits, and 

combined with the high-order four bits of the current program 

counter to form an absolute address. 

Returns, dispatches, and large cross-page jumps are usually 

implemented with the JumpRegister and JumpAndlinkRegister 

instructions. Both are R-type instructions which take a 32-bit or 64-bit 

byte address contained in one of the general-purpose registers. 

Table 2-12 and Table 2-13 summarize those CPU jump and branch 

instructions that are shared by all MIPS R-Series processors; 

Table 2-14 summarizes branch instructions that are extensions for the 

R4000. 

TabU2-12 Jump Instruction Summary 



Format and Description 



op 



target 



Jump And Link 



Instruction 



Jump Register 



J target 

Shift the 26-bit target address left two bits, combine with high-order tour bits 

of the PC, and jump to the address with a 1 -instruction delay. 



JAL target 

Shift the 26-bit target address left two bits, combine with high-order tour bits 
of the PC, and jump to the address with a 1-instruction delay. Place the ad- 
dress of the instruction following the delay slot in r31 {Link register). 



Format and Description 



op 



rs 



rd 



sa 



function 



JRrs 



Jump And Link 
Register 



Jump to the address contained in register rs, with a 1-instruction delay. 



JALRrs,rd ._,,«, 
Jump to the address contained in register rs, with a 1 -instruction delay. Place 
the address of the instruction following the delay slot in register rd. 
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The following description is common to Table 2-13 and Table 2-14 

• Branch target All branch instruction target addresses are 
computed by adding the address of the instruction in the 
delay slot and the 16-bit offset (shifted left two bits and 
sign-extended to 32 bits). All branches occur with a delay 
of one instruction. 

• Conditional branch (Table 2-14): If the conditional branch is 
not taken, the instruction in the delay slot is nullified. 

The following format fields are found in Table 2-13 to Table 2-20: 

REGIMM - Opcode 

Sub - Sub-operation Code 

CO - Sub-operation Specifier 

BC - BC Sub-opcode 

br - Branch Condition Specifier 

cofun - Coprocessor Function Field 

op - Operation Code 
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Table 2-13 Branch Instruction Summary 



Instruction 


Format and Description op rs rt \_ offset 


Branch on Equal 


BEQ rs,rt,offset 

Branch to target address if register rs is equal to register rt. 


Branch on Not 
Equal 


BNErs,i%offset 

Branch to target address if register rs is not equal to register rt. 


Branch on Less 
than or Equal Zero 


BLEZ rs,offset 

Branch to target address if register rs is less than or equal to zero. 


Branch on Greater 
Than Zero 


BGTZrs,offset 

Branch to target address if register rs is greater than zero. 


Instruction 


Format and Description REGIMM rs sub offset 


Branch on Less 
Than Zero 


BLTZ rs,offset 

Branch to target address if register rs is less than zero. 


Branch on Greater 
than or Equal Zero 


BGEZrs.offset 

Branch.to target address if register rs is greater than or equal to zero. 


Branch on Less 
Than Zero And 
Link 


BLTZAL rs.offset 

Place address of instruction following the delay slot in register r31 (Link 

register). Branch to target address if register rs is less than zero. 


Branch on Greater 
than or Equal Zero 
And Link 


BGEZAL rs,offset 

Place address of instruction following the delay slot in register r31 (Link 

register). Branch to target address if register rs is greater than or equal to 

zero. 
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Table 2-14 Branch Instruction Summary - (ISA Extensions) 



Instruction 



Branch on Equal 
Ukely 

Branch on Not 
Equal Likely 



Branch on Less Than 
or Equal to Zero Likely 



Format and Description 



op 



rs 



offset 



mT\ 



BEQL rs,rt,offset 

Branch to target address if register rs is equal to register it 



BNEL rs,rt,offset 

Branch to target address if register rs is not equal to register it 



Branch on Greater 
Than Zero Likely 



Instruction 



Branch on Less 
Than Zero Likely 



BLEZL rs,offset 

Branch to target address if register rs is less than or equal to zero. 



BGTZL rs,offset 

Branch to target address if register re is greater than zero. 



Format and Description 



REGIMM 



rs 



sub 



offset 



Branch on Greater 
Than or Equal to Zero 
Ukely 



Branch on Less 
Than Zero And Link 
Likely 



Branch on Greater 
Than or Equal to Zero 
And Link Likely 



BLTZLrs,offset 

Branch to target address if register re is less than zero. 



BGEZLrs.offset 

Branch to target address if register re is greater than or equal to zero. 



BLTZALLrs.offset 

Place address of instruction following the delay slot in register r31 

(Link register). Branch to target address if register re is less than 

zero. 



BGEZALL rs,offset 

Place address of instruction following the delay slot in register *37 
(Link register). Branch to target address if register re is greater than 
or equal to zero. ^^ 
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Special Instructions 

Special instructions allow the software to initiate traps and are always 
R-type. Special instructions that are valid for all MIPS R-Series 
processors are shown in Table 2-15. 
Table 2-15 Special Instructions 



instruction 



System Call 



Breakpoint 



Format and Description 



SPECIAL 



rs 



ra- 



sa 



function 



SYSCALL 

Initiates system call trap, immediately transferring control to exception handler. 



BREAK 



Initiates breakpoint trap, immediately transferring control to exception handler. 



Exception instructions 



Exception instructions are extensions to the ISA and are shown in 

Table 2-16 and Table 2-17. 

Table 2-16 Exception Instructions (ISA Extensions) 



instruction 



Trap if Greater 
Than or Equal 



Trap if Greater 
Than or Equal 
Unsigned 



Trap if Less 
Than 



Format and Description 



SPECIAL 



rs 



rd 



function 



Trap exception occurs if register rs is greater than or equal to register rt, consid 
erina both quantities a< * signed integers. : — 



TGEUrs,rt , , „ 

Trap exception occurs if register rs is greater than or equal to register rt, consid 

ering both quantities as unsigned integers. . 

Trap exception occurs if register rs is less than register rt, considering both quan 
titles as signed integers.. _ 



Trap if Less 
Than Unsigned 



Trap if Equal 



Trap if Not 
Equal 



Tifapejfception occurs if' register rs is less than register rt, considering both quan 
tities as unsigned integers.. — 



TEQ rs,rt 

Trap exception occurs if register rs is equal to register rt. 



TNErs.rt 

Trap exception occurs if register rs is not equal to register rt. 
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Instruction 



Trap if Greater 
Than or Equal 
Immediate 



Trap if Greater 
Than or Equal 
Unsigned 
immediate 



Trap if Less 
Than Immediate 



Trap if Less 
Than Unsigned 
Immediate 



Trap if Equal 
Immediate 
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Table 2-17 Exception Immediate Instructions (ISA Extensions) 



Trap if Not 
Equal Immediate 



Format and Description 



REGIMM 



rs 



sub 



immediate 



1 6-bit immediate, considering both quantities as signed integers. 



ed fcbit immediate, considering both quantities as unsigned integers. 



Trap Option occurs if register rs is less than to sign-extended 16-bit im- 
mediate, considering both quantities as signed integers., 



Trap exc'iStonSurs if register rs is less than to sign-extended 16-bit im- 
mediate, considering both quantities as unsigned integers. 



TEQI rs, immediate 

Trap exception occurs if register rs is equal \o immediate. 



TNEI rs,immediate 

Trap exception occurs if register rs is not equal to immediate. 



Coprocessor Instructions 



Coprocessor instructions perform operations in their respective 
coprocessors. Coprocessor loads and stores are I-type, and 
coprocessor computational instructions have coprocessor-dependent 
formats. Table 2-18 summarizes the coprocessor instructions valid on 
all MIPS R-Series processors; Table 2-19 summarizes those 
instructions defined as extensions to the ISA. 
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Table 2-18 Coprocessor Instruction Summary 



Instruction 



Load Word to 
Coprocessor z 



Store Word from 
Coprocessor z 



Instruction 



Move To 
Coprocessor z 



Format and Description 



op 



base 



offset 



LWCz rt,offset(base) 

Sign-extend 16-bit offset and add to contents of register base to form 
address. Load contents of addressed word into coprocessor register rt of 
coprocessor unit z. 



SWCz ttoffsetfbase) 

Sign-extend 16-bit offeef and add to contents of register base to form 
address. Store contents of coprocessor register rtfrom coprocessor unit 
z at addressed memory word. 



Format and Description COPz 



sub 



rd 



Move From 
Coprocessor z 



Move Control To 
Coprocessor z 



Move Control From 
Coprocessor z 



MTCz rt,rd 

Move contents of CPU register rt into coprocessor register rd of 

coprocessor unit z. 



MFCzrt,rd 

Move contents of coprocessor register rdof coprocessor unit z into CPU 

register rt. 



CTCz itrd 

Move contents of CPU register rt into coprocessor control register rd of 

coprocessor unit z. 



instruction 



Coprocessor z 
Operation 



Instruction 



CFCz rt,rd 

Move contents of control register rd of coprocessor unit z into CPU 

register rt. ^^^ 



Format and Description 



COPz 



CO 



cofun 



COPz cofun 

Coprocessor unit z performs an operation. The state of the CPU is not 

modified by a coprocessor operation. 



Format and Description COPz BC 



br 



offset 



Branch on 
Coprocessor z 
True 



Branch on 
Coprocessor z 
False 



BCzT offset 

Compute a branch target address by adding the address of the instruction 
in the delay slot and the 1 6-bit offset (shifted left two bits and sign extend- 
edly 32 bits). Branch to the target address (with a delay of one instruction) 
if coprocessor unit z's condition line is true. 



BCzF offset 

Compute a branch target address by adding the address of the instruction 
in the delay slot and the 16-bit offeer (shifted left two bits and sign extend- 
ed to 32 bits). Branch to the target address (with a delay of one instruc- 
tion) if coprocessor unit z's condition line is false. 
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Table 2-19 Coprocessor Instruction Summary (ISA Extensions) 



Instruction 



Doubleword 
Move From 
Coprocessor z 

Doubleword 
Move To 
Coprocessor z 



Format and Description 



COPz 



sub 1 



rd 



Move contents of coprocessor register rdof coprocessor unit z into CPU 
register it 



Instruction 



Load Doubleword 
to Coprocessor z 



Store Doubleword 
from Coprocessor z 



Move contents of CPU register rt into coprocessor register rdof coproces- 
sor unit z. 



Format and Description 



QP 



base 



offset 



SoSiKJlK? Skef and add to contents of register base to form ad- 
drls^Load contents of addressed doubleword into coprocessor reg.ster. 
rt of coprocessor unit z . — 



Instruction 



Branch on 
Coprocessor z 
True Likely 



Branch on 
Coprocessor z 
False Likely 



SDCz rt,offset(base) 

Sian-extend 1 ©-bit offset and add to contents of register base to form i ad- 
S Store contents of coprocessor register rt from coprocessor unrt z 
at addressed memory word. ' 



Format and Description 



COPz 



BC 



br 



offset 



Comoute a branch target address by adding the address of the instruction 
Kdelav slot and the 16-bit offset (shifted left two bits and sign extend- 
ed to 32 8_TbX!cS to the target address (with a de ay f one instruc- 
tion) if coprocessor unit z condition line is true. If condrtional branch is not 
taken, the instruction in the branch delay slot is nullified. 



BCzFL offset 

ComDute a branch target address by adding the address of the instruction 
ffidelav s^ot and the 16-brt offset (shifteB left two bits and s.gn extend- 
ed R llbrts). Branch to the target address (with a delay of - one mstruc- 
tlon) if coprocessor unit z condition line is false. If condrtional branch is not 
taken, the instruction in the branch delay slot is nullified. 
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System Control Coprocessor (CPO) Instructions 



Coprocessor instructions perform operations on the System Control 
Coprocessor (CPO) registers to manipulate the memory management 
and exception handling facilities of the processor. Table 2-20 
summarizes the available instructions that work with CPO. 
Table 2-20 System Control Coprocessor (CPO) Instruction Summary 



Instruction 


Format and Description COP0 sub rt rd | 


Move To CPO 


MTCOrtrd _^ xrsnn 
Load the contents of CPU register rt into register roof CPO. 


Move From CPO 


MFCO rt,rd 

Load the contents of CPO register rd into CPU register rt 


Instruction 


Format and Description COP0 CO function j 


Read Indexed 
TLB Entry 


TLBR 

Load EntryHi, EntryLoO, and Entry Lo1 registers with TLB entry pointed 

to by the Index register. 


Write Indexed 
TLB Entry 


TLBWI 

Load TLB entry pointed to by the Index register with the contents of the 

EntryHi, EntryLoO, and Entry Lot registers. 


Write Random 
TLB Entry 


TLBWR 

Load TLB entry pointed to by the Random register with the contents of the 

EntryHi, EntryLoO, and Entry Lot registers. 


Probe TLB for 
Matching Entry 


TLBP 4 t 
Load the Index register with the address of the TLB entry whose contents 
match the EntryHi, EntryLoO, and Entry Lot registers. If no TLB entry 
matches, set the high-order bit of the Index register. 


Return from 
Exception 


ERET 

Return from exception, interrupt, or error trap. 


Instruction 


Format and Description CACHE base op offset 


Cache 
Operation 


CACHE op,offset(base) 

Virtual address is formed from addition of offset and base, and this virtual 
address is translated into a physical address using the TLB. Sub-opcode 
op specifies a cache operation for this address. 
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This chapter describes the operation of the R4000 instruction 
execution pipeline. It first describes the basic operation of the pipeline. 
It then explains how the R4000 handles delay instructions; these are 
instructions that follow a branch or load instruction in the pipeline. A 
later section explains interruptions to the pipeline flow caused by 
interlocks and exceptions. 

Basic Pipeline Operation 

The R4000 processor has an eight-stage execution pipeline. Each 
pipeline stage takes one pcyde (one cycle of pclock, which runs at 
twice the frequency of MasterClock).The execution of each instruction 
thus takes at least eight pcydes (four MasterClock cycles). An 
instruction may take longer; for example, when the required data is 
not in the cache and must be retrieved from main memory. Once the 
pipeline has been completely filled, eight instructions are always 
being executed simultaneously. 

The eight stages of the R4000 pipeline are listed below and are shown 
in Figure 3-1. 

1. Instruction Fetch, Phase % (IF) 

2. Instruction Fetch, Phase 2 (IS) 

3. Register Fetch (RF) 

4. Execution (EX) 

5. Data Fetch, Phase l(DF) 

6. Data Fetch, Phase 2 (DS) 

7. Tag Check (TO 

8. Write Back (WB) 
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Figure 3-1 R4000 Pipeline and Instruction Overlapping 

Figure 3-2 shows the activities occurring during each pipeline stage 
for ALU, load and store, and branch instructions. The subsections 
following Figure 3-2 describe the activities during each stage in more 
detail. 
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Instruction cache access stage 1 
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Instruction address translation stage 1 
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Data cache access stage 2 
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Instruction virtual address calculation 
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Figure 3-2 R4000 Pipeline Activities 

IF - Instruction Fetch, First Half 

An instruction address is selected by the branch logic and the 
instruction cache fetch begins. The Instruction Translation Lookaside 
Buffer (ITLB) begins the virtual-to-physical address translation. 

IS - Instruction Fetch, Second Half 

The instruction cache fetch and the virtual-to-physical address 
translation are completed. 
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RF - Register Fetch 

The instruction decoder (IDEC) decodes the instruction and checks for 
interlock conditions. The instruction cache tag is checked against the 
page frame number obtained from the ITLB. Any required operands 
are fetched from the register file. 

EX - Execution 

For register-to-register instructions, the ALU performs the arithmetic 
or logical operation. For load and store instructions, the ALU 
calculates the data virtual address. For branch instructions, the ALU 
determines whether the branch condition is true and calculates the 
virtual branch target address. 

DF - Data Fetch, First Half 

For load and store instructions, the data cache fetch and the data 
virtual-to-physical translation begin. For branch instructions, the 
branch instruction address translation and TLB update begin. 
Register-to-register instructions perform no operations during the DF, 
DS, and TC stages. 

DS - Data Fetch, Second Half 

For load and store instructions, the data cache fetch and data virtual- 
to-physical translation are completed. The Shifter aligns the data to 
the word or doubleword boundary. For branch instructions, the 
branch instruction address translation and TLB update are completed. 

TC- Tag Check 

For load and store instructions, the cache performs the tag check. The 
physical address from the TLB is checked against the cache tag to 
determine if there is a hit or a miss. 

WB - Write Back 

For register-to-register instructions, the instruction result is written 
back to the register file. Branch instructions perform no operation 
during this stage.". 
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Branch and Load Delay 
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f(load) 
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The more finely grained pipeline of the R4000 results in a branch delay 
of three cycles and a load delay of two. The branch delay of three is 
easily observed by noting that the branch comparison logic operates 
during the EX pipestage of the branch, producing an instruction 
address which is available for IF stage of the fourth subsequent 
instruction. The branch delay is illustrated in the Figure 3-3. 
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Figure 3-3 R4000 Pipeline Branch Delay 

Similarly the load delay of two is evident in that the completion of a 
load at the end of the DS pipestage of a load, produces an operand 
which is available for the EX pipestage of the third subsequent 
instruction. The load delay is illustrated in the Figure 3-4. 
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Figure 3-4 R4000 Pipeline Load Delay 
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Interlock and Exception Timing 



Smooth pipeline flow is interrupted when cache accesses miss, data 

dependencies are detected, or when exceptions occur. Interruptions 

that are handled by hardware, such as cache misses, are referred to as 

interbcks, while those that are handled using software are exceptions. 

Collectively, the cases of all interlock and exception conditions are 

referred to as faults. 

Interlocks come in two varieties. Those interlocks which are resolved 

by simply stopping the pipeline are referred to as stalb, while those 

which require part of the pipeline to advance while holding up 

another part are slips. 

At each cycle, exception and interlock conditions are checked for all 

active instructions. 

Because each exception or interlock condition corresponds to a 

particular pipeline stage, the conditions can be referred back to 

particular instructions (see Figure 3-5). 
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Figure 3-5 Correspondence of Pipeline Stage to Interlock Condition 
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Figure 3-5 shows the correlation between the interlocks and exceptions shown in Figure 3- 
5. 

Table 3-1 Correspondence of Pipeline Stage to Interlock Condition 
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When an exception condition occurs, the relevant instruction and all 
that follow it in the pipeline are cancelled. Accordingly, any stall 
conditions and any later exception conditions that are referenced to 
the same instruction are inhibited; there is no value in servicing stalls 
for a cancelled instruction. A new instruction stream is begun, starting 
execution at a predefined exception vector. System control 
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coprocessor registers are loaded with information that will identify 
the type of exception and any necessary auxiliary information, such as 
the virtual address at which translation exceptions occur. 
When a stall condition is detected, all eight instructions, each in a 
different stage of the pipeline, are frozen at once. Often, the stall 
condition is only detected after parts of the pipeline have advanced 
using incorrect data; we refer to this occurrence as pipeline overrun. 
When in the stalled state, no pipeline stages advance until the 
interlock condition is resolved. After the interlock is removed, the 
restart sequence begins two cycles before resuming execution. The 
restart sequence reverses the pipeline overrun condition by inserting 
the correct information into the pipeline. 
When a slip condition is detected, the pipeline stages which must 
advance in order to resolve the dependency continue to be retired 
while the dependent stages are held until the necessary data is 
available. 

Another class of interlocks exists which, since they originate external 
to the processor, are not referenced to a particular pipeline stage 
These interlocks are referred to as external stalls and are unaffected by 
the occurrence of exceptions. 

In order to prevent interlock and exception handling from adversely 
affecting the processor cycle time, the R4000 uses both logic and circuit 
pipelining techniques to reduce critical timing paths. Logical 
pipelining of interlock and exception handling has the following two 
principal effects: 

1 . the processor pipeline must be backed up in some cases to 
recover from interlocks, and 

2. in some cases interlocks will be serviced for instructions 
which will be aborted due to an exception. 

An example of the former happens in the case of data cache misses, 
where the late detection of the miss causes a subsequent instruction to 
compute an incorrect result. Not only must the cache miss be serviced 
but the EX stage of the dependent instruction must be redone before 
the pipeline can be restarted. Figure 3-6 below illustrates this 
phenomena. A minus (-) following a pipestage descriptor indicates 
that the operation performed produced an incorrect result while a 
plus (+) indicates the successful re-execution of that operation. 
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Figure 3-6 Pipeline Overrun 



An example of a case in which interlocks are serviced for instructions 
which will subsequently be aborted is the interaction between integer 
overflow and instruction cache miss. In this case, pipelining the 
overflow exception handling into the DF pipestage will allow an 
instruction cache miss to occur on the immediately subsequent 
instruction This is illustrated in Figure 3-7. Aborted instructions are 
denoted with an asterisk (*). 

Ignoringthe fact that the line brought in by the instruction cache could 
be replaced by a line of the exception handler, it can be argued that no 
performance loss occurs since the instruction cache miss would have 
otherwise been serviced after returning from the exception handler 
anyway. A more legitimate argument for handling the exception in 
this fashion however is that the frequency of exceptions is relatively 
low by definition If this were not the case the processor would spend 
most of its time in the exception handler and no progress would be 
made. 
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Figure 3-7 Instruction Cache Miss 

Circuit pipelining of interlock and exception handling is 
accomplished by pipelining the logical resolution of the possible fault 
conditions with the buffering and distribution of the pipeline control 
signals. In particular, a half clock period is provided for the buffering 
and distribution of the run control signal and during this time the logic 
evaluation to produce run for the next cycle is begun. This process is 
illustrated in Figure 3-8 for a sequence of loads. 



clock 

phase | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 2 | 

loadl: I DF I DS I TC I WB ~l 



DF 



|TagCk| Resolve | Buffer | 

I TC I WB I 

iTaoCkl Resolve | Buffer | 



DS 



DF 



DS 



TC 



WB 



ITagCkl Resolve | Buffer | 



Figure 3-8 Circuit Pipelining of Interlock and Exception Handling 
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In the general case, resolving whether or not the pipeline advances in 
any particular cycle is done in the same three step sequence: 

1. Individually evaluate all possible fault causing events such as 
cache misses, translation exceptions, load interlocks, etc. 

2. Resolve which fault is to be serviced based on a predefined 
priority determined by the pipestage of the asserted faults. 

3. Buffer and distribute the pipeline advance control signals. 
This process is illustrated in Figure 3-9. 



\ / — v_y \ — / \. 



clock 

phase | 1 |. 2 | 1 | 2 | 1 | 2 | 1 | 2 

cycle I Run I R » " ' Run I Run — j 

| Evaluate | Resolve I Butter | 

[Evaluate I Resolve I Buffer | 

I Evaluate I Resolve I Buffer | 



Figure 3-9 Pipeline Advance Resolution 



Special Cases 



In some instances, the pipeline control state machine is bypassed. This 
bypassing occurs due to either: 

• performance considerations, or 

• correctness considerations. 

An example of the former occurs in the case of cache misses on loads. 
By bypassing the pipeline state machine in this instance it is possible 
to eUminate up to two cycles from the load miss latency. In this case, 
it is relatively straightforward to perform the bypass since sending the 
cache miss address to the secondary cache has no negative impact 
even if an exception later nullifies the effect of the cache access. The 
bypassing of the potential cache miss address is referred to as address 
acceleration. It is noted that an argument could be put forward that 
some power is wasted when the miss is inhibited by some fault, but 
this will be a minor effect. Another technique used in the R4000 to 
reduce miss latency is the automatic increment and driveout of 
instruction miss addresses following an instruction cache miss. This 
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form of latency reduction is referred to as address prediction as the 
subsequent instruction miss address is being predicted to be a simple 
increment of the previous miss address. Figure 3-10 illustrates a cache 
miss where the cache miss address is shown changing based simply 
on detection of the miss. 
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Figure 3-10 load Address Bypassing 

An example of a case where bypassing is necessary to guarantee 
correctness is cache writes. 
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The MIPS R4000 processor provides a full-featured memory 
management unit (MMU) that uses an on-chip Translation Lookaside 
Buffer (TLB) to translate virtual addresses into physical addresses. 
The MMU provides very fast virtual memory translation. This chapter 
describes the operation of the TLB and the CPO registers that provide 
tihte software interface to the TLB. The memory mapping scheme, 
which translates virtual addresses to physical addresses, is also 
described in detail. 

Memory System Architecture 

The virtual memory system extends the address space available to 
programs by translating addresses composed in a large virtual 
address space into physical memory space. 
The R4000 physical address space is 64 Gigabytes using a 36-bit 
address. The virutal address is either 64 or 32 bits wide depending on 
whether the processor is operating is 32- or 64-bit mode. In 32-bit 
mode, addresses are 32-bits wide and the maximum user process size 
is 2 Gigabytes (2 31 ). In 64-bit mode, addresses are 64-bits wide and the 
maximum user process is 1 Terabyte (2 40 ) 
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Figure 4-1 R4000 32-bit Virtual Address Format 

The virtual address is extended with an Address Space Identifier 
(ASID) to reduce the frequency of TLB flushing when switching 
context The size of the ASID field is 8 bits. The ASID is contained in 
the CPO EntryHi register. The CPO EntryHi register is described in this 
chapter. 

Operating Modes 

This section describes the three operating modes of the R4000 for 32- 
and 64-bit operation: 

• User mode 

• Supervisor mode 

• Kernel mode 

Two of these modes are provided by all MIPS R-Series processors: 
Kernel mode, which is analogous to the "supervisory" mode provided 
by many machines, and User mode, in which nonsupervisory 
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programs are executed. The R4000 provides a third, intermediate 
mode, called Supervisor mode. This mode can be used to more easily 
build secure operating systems. 
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Figure 4-2 R4000 64-bit Virtual Address Format 

The CPU enters Kernel mode whenever an exception is detected and 
it remains in Kernel mode until an Exception Return (ERET) 
instruction is executed. The ERET instruction restores the processor to 
the mode existing prior to taking the exception. 



User Mode Virtual Addressing 



In User mode, a single, uniform virtual address space (useg) of 2 
GBytes (2 31 bytes) in 32-bit mode or 1 Terabyte (Z*° bytes) in 64-bit 
mode is available, as shown in Figure 4-3. Figure 4rl and Figure 4-2 
show that the virtual address is extended with an 8-bit Address Space 
Identifier (ASID) field during virtual to physical address translation to 
form unique virtual addresses for up to 256 user processes. By 
assigning each process an ASID, the system is able to maintain the TLB 
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contents across context switches. All references to useg are mapped 
through the TLB, and the cacheability of a reference is determined by 
bit settings within the TLB entry for the page. 
The User segment starts at address zero, 0x0000 0000. The TLB maps 
all references to useg identically from all modes, and controls cache 
accessibility. (The C-bits in a TLB entry determine whether the 
reference is cached; see Figure 4-1.) The current user process resides in 
useg. Figure 4-3 shows User mode address space. 
When bits KSU equals 10, bit EXL equals 0, and bit EKL equals in the 
Status register (see Chapter 5 for a description of the Status register), 
the processor is executing in User mode. The UX bit in the Status 
register selects 32- or 64-bit addressing. 

• useg. When UX = in the Status Register, user-mode 
addressing is compatible with 32-bit addressing shown in 
Figure 4-3. All valid User mode virtual addresses have the 
most-significant bit cleared to 0; any attempt to reference 
an address with the most-significant bit set while in the 
User mode causes an Address Error exception. (See 
Chapter 5). The TLB refill exception vector is used for TLB 
misses. 

• xuseg. When UX =1 in the Status Register, user-mode 
addressing is exteneded to 64-bit addressing shown in 
Figure 4-3. The R4000 provides a single, uniform address 
space of 2 40 bytes for user processes. All valid user-mode 
virtual addresses have bits 63..40 equal to zero; an attempt 
to reference an address with bits 63..40 not equal to zero 
causes an Address Error exception (See Chapter 5). The 
Extended addressing TLB refill exception vector is used for 
TLB misses. 
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Figure 4-3 MIPS User Mode Virtual Address Space 



Supervisor Mode Virtual Addressing 



In the following discussion, please refer to Figure 4-4. Supervisor 
mode is intended for those layered operating system implementations 
where a "true kernel" runs in R4000 Kernel mode, and the rest of the 
operating system runs in Supervisor mode. When bits KSU equals 01, 
bit EXL equals 0, and bit ERL equals in the Status register (see 
Chapter 5 for a description of the Status register), the processor is 
executing in Supervisor mode. The SX bit in the Status register selects 
32- or 64-bit addressing. 

• suseg. When SX = in the Status register and the most- 
significant bit of the 32-bit virtual address is set to 0, the 
virtual address space, named suseg, covers the full 2 31 bytes 
(2 Gbytes) of the current user address space. The virtual 
address is extended with the contents of the ASID field to 
form unique virtual addresses. This mapped space starts at 
virtual address 0x0000 0000 and runs up through 

0x7FFFFFFF. 

• sseg. When SX = in the Status register and the most- 
significant three bits of the 32-bit virtual address are 110, 
the virtual address space selected is the current 2 29 -byte 
(512-Mbyte) supervisor virtual space labelled sseg. The 
virtual address is extended with the contents of the ASID 
field to form unique virtual addresses. This mapped space 
begins at virtual address OxCOOO 0000 and runs up through 
OxDFFFFFFF. 



R4000 User's Manual-Preliminary 



4-5 



Chapter 4 



xsuseg. When SX = 1 in the Status register and bits 63..62 of 
the virtual address are set to 00, the virtual address space, 
named xsuseg, covers the full 2*° bytes (1 Terabyte) of the 
current user address space. The virtual address is extended 
with the contents of the ASID field to form unique virtual 
addresses. This mapped space starts at virtual address 
0x0000 0000 0000 0000 and runs up through 
0x0000 00FF FFFF FFFF. 

xsseg. When SX = 1 in the Status register and bits 63..62 of 
the virtual address are set to 01, the virtual address space 
selected is the current supervisor virtual space labelled 
xsseg. The virtual address is extended with the contents of 
the ASID field to form unique virtual addresses. This 
mapped space begins at virtual address 
0x4000 0000 0000 0000 and runs up through 
0x4000 00FF FFFF FFFF. 

csseg. When SX = 1 in the Status register and bits 63..62 of 
the virtual address are set to 11, the virtual address space 
selected is the current supervisor virtual space labelled 
csseg. Addressing of the csseg is compatible with supervisor" 
addressing in 32-bit mode. This mapped space begins at 
virtual address OxFFFF FFFF COOO 0000 and runs up 
through OxFFFF FFFF DFFF FFFF. 
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Figure 4-4 MIPS R4000 Supervisor Mode Address Space 
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Kernel Mode Virtual Addressing 

When the processor is operating in Kernel mode (bits KSU equals 00, 
or bit EXL equals 1, or bit ERL equals 1, in the Status register) the 
virtual address space is divided into regions, differentiated by the 
high-order bits of the virtual address. The KXbit in the Status register 
selects 32- or 64-bit addressing: 

• kuseg. When KX = in the Status register and the most- 
significant bit of the virtual address is cleared, the 32-bit 
virtual address space selected covers the full 2 31 bytes (2 
Gbytes) of the current user address space labelled kuseg. 
The virtual address is extended with the contents of the 
ASID field to form unique virtual addresses. 

• ksegO. When KX - in the Status register and the most- 
significant three bits of the virtual address are 100, the 
32-bit virtual address space selected is the current 2 -byte 
(512-Mbyte) kernel physical space labelled ksegO. 
References to ksegO are not mapped through the TLB; the 
physical address selected is defined by subtracting 0x8000 
0000 from the virtual address. Cacheability and coherency 
are controlled by the K0 field of the Config register 
described in Chapter 5 Exception Processing. 

• ksegl. When KX = in the Status register and the most- 
significant three bits of the 32-bit virtual address are 101, 
the virtual address space selected is the current 2 29 -byte 
(512-Mbyte) kernel physical space labelled ksegl. 
References to ksegl ate not mapped through the TLB; the 
physical address selected is defined by subtracting 
OxAOOO 0000 from the virtual address. Caches are disabled 
for accesses to these addresses, and physical memory (or 
memory-mapped I/O device registers) are accessed 
directly. 

• ksseg. When KX = in the Status register and the most- 
significant three bits of the 32-bit virtual address are 110, 
the virtual address space selected is the current 2 2 -byte 
(512-Mbyte) supervisor virtual space labelled ksseg. The 
virtual address is extended with the contents of the ASID 
field to form unique virtual addresses. 

• ksegS. When KX = in the Status register and the most- 
significant three bits of the 32-bit virtual address are 111, 
the virtual address space selected is the current 2 29 -byte 
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(512-Mbyte) kernel virtual space labelled kseg3. The virtual 
address is extended with the contents of the ASID field to 
form unique virtual addresses. 

xkuseg. When KX = 1 in the Status register and bits 63..62 of 
the 64-bit virtual address are 00, the virtual address space 
selected covers the current user address space labelled 
xkuseg. The virtual address is extended with the contents of 
the ASID field to form unique virtual addresses. As a 
special feature for the ECC handler, if the EKL bit of the 
Status register is set, the user address region becomes a 
2 31 -byte unmapped, uncached space. The allows the ECC 
exception code to operate uncached using R0 as a base 
register. 

xksseg. When JCX = 1 in the Status register and bits 63..62 of 
the 64-bit virtual address are 01, the virtual address space 
selected is the current supervisor virtual space labelled 
xksseg. The virtual address is extended with the contents of 
the ASID field to form unique virtual addresses. 
xkphys. When KX = 1 in the Status register and bits 63..62 of 
the 64-bit virtual address are 10, the virtual address space 
selected is a set of eight ^-byte kernel physical spaces 
labelled xkphys. Addresses with bits 58..36 not equal to zero 
cause an address error. References to this space are not 
mapped; the physical address selected is taken directly 
from bits 35..0 of the virtual address. The cachebility and 
coherence algorithm is specified by bits 61..59 of the virtual 
address (see EntryLo for the cache algorithm values). 



Value 


Cache Algorithm 


Starting Address 





reserved 


0x8000 0000 0000 0000 


1 


reserved 


0x8800 0000 0000 0000 


2 


uncached 


0x9000 0000 0000 0000 


3 


cacheable, non-coherent 


0x9800 0000 0000 0000 


4 


cacheable, coherent exclusive 


OxAOOO 0000 0000 0000 


5 


cacheable, coherent exclusive on write 


0xA800 0000 0000 0000 


6 


cacheable, coherent update on write 


OxBOOO 0000 0000 0000 


7 


reserved 


0xB800 0000 0000 0000 
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• xkseg. When KX = 1 in the Status register and bits 63..62 of 
the 64-bit virtual address are 11, the virtual address space 
selected is the current supervisor virtual space labelled 
xkseg. The virtual address is extended with the contents of 
the ASID field to form unique virtual addresses. 

• cksegO. When KX = 1 in the Status register and bits 63..62 of 
the 64-bit virtual address are 11, the 64-bit virtual address 
space selected is an unmapped region compatible with the 
32-bit address model ksegO when bits 61..31 of the virtual 
address equal -1. Cacheability and coherency are controlled 
by the KO field of the Conng register described in Chapter 
5 Exception Processing. 

• cksegl . When XX - 1 in the Status register and bits 63..62 of 
the 64-bit virtual address are 11, the 64-bit virtual address 
space selected is an unmapped and uncached region 
compatible with the 32-bit address model ksegl when bits 
61..31 of the virtual address equal -1. 

• cksseg. When KX ■ 1 in the Status register and bits 63..62 of 
the 64-bit virtual address are 11, the 64-bit virtual address 
space selected is the current supervisor virtual space 
compatible with the 32-bit address model ksseg when bits 
61..31 of the virtual address equal -1. 

• ckseg3. When KX = 1 in the Status register and bits 63..62 of 
the 64-bit virtual address are 11, the 64-bit virtual address 
space selected is kernel virtual space compatible with the 
32-bit address model kseg3 when bits 61..31 of the virtual 
address equal -1. 

Figure 4-5 shows the boundaries of the segments defined in this mode. 
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Figure 4-5 MIPS R4000 Kernel Mode Address Space 



Virtual Memory and the TLB 
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Mapped virtual addresses are translated into physical addresses using 
an on-chip TLB. The TLB is a fully-associative memory that holds 48 
entries that provide mapping to 48 odd/even page pairs.The address 
range mapped by a page can range in size from 4 Kbytes to 16 Mbytes 
(increasing by multiples of At i.e., 4K, 16K, 64K, 256K, 1M, 4M, 16M). 
Whtnaddress mapping is indicated, each TLB entry is simultaneously 
checked for a match with the virtual address extended by the current 
ASID stored in the EntryHi register. 
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If there is a match (a hit), the physical page number is extracted from 
the TLB and concatenated with the offset to form the physical address. 
If no match occurs (a miss), an exception is taken and software refills 
the TLB from a Page Table resident in memory. Software can write 
over a selected TLB entry or use a hardware mechanism to write into 
a random entry. 

If more than one entry in the TLB matches the virtual address being 
translated, the operation is undefined and the TLB may be shut down. 
The TLB-Shutdown (TS) bit in the Status register is set to 1 if the TLB is 
disabled. 

System Control Coprocessor 

The system control coprocessor (CPO) is implemented as an integral 
part of the CPU. CPO supports address translation, exception 
handling, and other privileged operations. CPO also contains the 
registers shown in Figure 4r6 plus a 48-entry TLB. The sections that 

follow describe how each of the TLB-related registers are used. 
NOTE: CPO functions and registers associated with exception handling are 
described in Chapter 5, Exception Processing. 
The numeral accompanying each register refers to the register 
number, as described in Chapter 2, CPU Instruction Set Summary. 
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CPO and the TLB 
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Figure 4-6 The R4000 CPO Registers and the TLB 



TLB Entry Format 



Figure 4-7 shows the TLB entry format for 32- and 64-bit addressing. 
Each field of an entry has a corresponding field in the EntryHi, 
EntryLoO, EntryLol, or PageMask registers, as shown in Figure 4-8. 
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Figure 4-7 Format of an R4000 TLB Entry 

The format of the EntryHi, EntryLoO, EntryLol, and PageMask registers 
are nearly the same as the 48-bit TLB entry. There is one exception, the 
TLB does use the Global field (bit 76) which is reserved in the EntryHi 
register. 
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24 



63 



30 29 



3 1 1 
6 5 3 2 1 



PFN 



V 



34 



24 



63 



30 29 



6 5 



3 111 
3 2 10 



34 



PFN 



D 



V 



24 



1 1 1 



Page Frame Number. Upper bits of the physical address. 

Specifies the cache algorithm to be used; see Table 4-1 . 

Dirty. If this bit is set, the page is marked as dirty and, therefore, writable. This bit is actually a 

write-protect bit that software can use to prevent alteration of data. 

Valid. If this bit is set, it indicates that the TLB entry is valid; otherwise, a TLBL or TLBS Miss occurs. 

Global. If this bit is set in both LoO and Lo1 , then ignore the ASID during TLB lookup. 

Reserved. Must be written as zeroes, returns zeroes when read. 



Figure 4-8 Fields of an R4000 TLB Entry Registers 
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The cache algorithm (C) bits specify whether references to the page 
should be cached; if cached, the algorithm selects between several 
cache coherency algorithms. Table 4-1 shows the algorithms selected 
by decoding the C bits. 

Table 4rl Cache Algorithm Bit Values 



C Bit Value 







reserved 



reserved 



Algorithm 



uncached 



cacheable noncoherent (noncoherent) 



cacheable coherent exclusive (exclusive) 



cacheable coherent exclusive on write (sharable) 



cacheable coherent update on write (update) 



reserved 



EntryHi, EntryLoO, EntryLol, and PageMask Registers 

These registers provide the data pathway through which the TLB is 
read, written, or probed. When address translation exceptions occur, 
these registers are loaded with relevant information about the address 
that caused the exception. 

EntryHi Register (CPU Register 10) 

The EntryHi register is a read/write register used to access the TLB. In 
addition, the EntryHi register contains the current ASID value for the 
processor. This is used to match the virtual address with a TLB entry 
when virtual addresses are presented for translation. 
The EntryHi register holds the contents of the high-order bits of a TLB 
entry when performing TLB read and write operations. When either a 
TLB refill, TLB invalid, or TLB modified exception occurs, the EntryHi 
register is loaded with the Virtual Page Number (VPN) and the ASID 
of the virtual address that failed to have a matching TLB entry. For 
more information on TLB exceptions, see Chapter 5, Exception 
Processing. 

EntryHi is accessed by the TLBP, TLBW, TLBWI, and TLBR 
instructions. Figure 4-8 shows the format of this register. 
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EntryLoO (2), and EntryLol (3) Registers 

EntryLo consists of two registers: EntryLoO for even virtual pages and 
EntryLol for odd virtual pages. The EntryLoO and EntryLol registers 
are read/ write registers used to access a TLB. EntryLoO and EntryLol 
hold the Physical Page Frame Number (PFN) of the TLB entry for even 
and odd pages respectively when performing TLB read and write 
operations. Figure 4-8 shows the format of these registers. 

PageMask Register (5) 

The PageMask register is a read/write register for reading from or 

writing to the TLB; it implements a variable page size by holding a 

per-entry comparison mask. TLB read and write operations use this 

register as a source or destination; when virtual addresses are 

presented for translation, the corresponding bits in the TLB specify 

which of the virtual address bits 24..13 participate in the comparison. 

Figure 4-8 shows the format of the PageMadc register. 

Table 4-2 gives MASK values for the full range of page sizes. When 

MASK is not one of these values, the operation of the TLB is 

undefined. 

Table 4-2 MASK Values for Page Sizes 



Page size 


Bit ' 


24 


23 


22 


21 


20 


19 18 


17 


16 15 14 13 


4 Kbytes 


























16 Kbytes 























1 




64 Kbytes 























11 




256 Kbytes 

















1 


1 


1 1 




1Mbyte 














1 


1 1 


1 


1 1 




4 Mbytes 








1 


1 


1 


1 1 


1 


1 1 




16 Mbytes 


1 


1 


1 


1 


1 


1 1 


1 


1 1 
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Index Register (0) 



The Index register is a 32-bit, read/write register containing six bits 
that index an entry in the TLB. The high-order bit of the register shows 
the success or failure of a TLB Probe (TLBP) instruction (described at 
the end of this chapter). 

The Index register also specifies die TLB entry that is affected by the 
TLB Read (TLBR) and TLB Write Index (TLBWI) instructions. Figure 
4-9 shows the format of the Index register. 



Index Register 



31 30 



6 5 



Index 



25 



P Probe tailure. Set to 1 when the last TLBProbe CTLBP) instruction was 
unsuccessful. 
Index Index to the TLB entry that will be affected by the TLBRead and TLBWrite 
instructions. 

Reserved. Must be written as zeroes, returns zeroes when read. 



Figure 4-9 The Index Register 



Random Register (1) 



The Bandom register is a read-only register of which six bits are used 
to index an entry in the TLB. This register decrements for each 
instruction executed. The values range between: 

• a lower bound set by the number of TLB entries reserved 
for exclusive use by the operating system (the contents of 
the Wired register), and 

• an upper bound set by the total number of TLB entries. (47 
maximum.) 

The Bandom register specifies the entry in the TLB affected by the TLB 
Write Random instruction, TLBWR. The register does not need to be 
read for this purpose; however, the register is readable to verify 
proper operation of the processor. 
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To simplify testing, the Random register is set to the value of the upper 
bound upon system reset This register is also set to the upper bound 
when the Wired register is written. The format of the Random register 
is shown in Figure 4-10. 



Random Register 



31 



65 



Random 



26 



Random TLB Random Index 

Reserved. Must be written as zeroes, returns zeroes when read. 



Figure4-10 The Random Register 

Wired Register (6) 

The Wired register is a read/ write register that specifies the boundary 
between the wired (fixed, nonreplaceable entries that cannot be 
overwritten by a TLBWR operation) and random entries of the TLB 
(see Figure 4-11). 




Figure 4-11 Wired Register Location 
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The Wired register is set to zero upon system reset Writing this 
register also sets the Random register to the value of its upper bound 
(see Random Register, above). Figure 4-12 shows the format of the 
Wired register. 



Wired Register 



31 



6 5 



Wired 



26 



6 



Wired TLB Wired boundary 
Reserved. Must be written as zeroes, returns zeroes when read. 



Figure 4rl2 The Wired Register 

Virtual Address Translation 

During virtual-to-physical address translation, the CPU compares me 
ASID and, depending upon the page size, the highest 7-to-19 bits in 
32-bit mode (VPN) and the highest 15-to-27 bits in 64-bit mode (VPN) 
of the virtual address to the contents of the TLB. Figure 4-13 illustrates 
the TLB address translation process. 



R4000 User's Manual-Preliminary 



4-19 



Chapter 4 



Virtual Address (Input) 



Address 
Error 

Exception 




i 




Yes 
Unmapped V— <&SBs-1 05 
Access 



No 



Address 
Error 



Exception 




Exception 



XTLB 

Refill 



Exception 



Physical Address (Output) 



Figure 4-13 R4000 TLB Address Translation 
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A virtual address matches a TLB entry when the VPN field of the 
virtual address equals the VPN field of the entry, and either the G bit 
of the TLB entry is set or the ASID field of the virtual address (the 
current ASID is held in the EntryHi register) matches the ASID field of 
the TLB entry. While the V bit of the entry must be set for a valid 
translation to take place, it is not involved in the determination of a 
matching TLB entry. 

If a TLB entry matches, the physical address and access control bits (C, 
D, and V) are retrieved from the matching TLB entry. Otherwise, a 
TLB miss exception occurs. If the access control bits (D and V) indicate 
that the access is not valid, a TLB modification or TLB invalid 
exception occurs. If the C bits equal binary 010, the physical address 
that is retrieved is used to access main memory, bypassing the cache. 
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TLB Instructions 



The instructions that the CPU provides for working with the TLB are 
listed in Table 4-3, and are described briefly below. 
Table 4-3 TLB Instructions 



Op Code 


Description 


TLBP 


Translation Lookaside Buffer Probe 


TLBR 


Translation Lookaside Buffer Read 


TLBWI 


Translation Lookaside Buffer Write Index 


TLBWR 


Translation Lookaside Buffer Write Random 



Translation Lookaside Buffer Probe (TLBP). The Index register is 
loaded with the address of the TLB entry whose contents match the 
contents of the EntryHi register. If no TLB entry matches, the high- 
order bit of the Index register is set An instruction occurring 
immediately after a TLBP instruction and causing a memory data 
reference produces undefined results. Results are also undefined if a 
TLB reference produces more than one hit in the TLB. 
Translation Lookaside Buffer Read (TLBR). This instruction loads 
the EntryHi and EntiyLoO, EntryLol registers with the contents of the 
TLB entry specified by the contents of the Index register. 
Translation Lookaside Buffer Write Index (TLBWI). This instruction 
loads the specified TLB entry with the contents of the EntryHi and 
EntryLoO, EntryLol registers. The contents of the Index register specify 
the TLB entry. 

Translation Lookaside Buffer Write Random CTLBWR). This 
instruction loads a pseudo-randomly-specified TLB entry with the 
contents of the EntryHi and EntryLoO, EntryLol registers. The contents 
of the Random register specify the TLB entry. 



4-22 



R4000 User's Manual-Preliminary 



Exception Processing 



This chapter describes the exception processing capabilities and 
hardware of the R4000. It presents an overview of the CPU exception 
handling process and describes the format and use of each CPU 
exception handling register. This chapter also describes how the 
R4000 handles each kind of exception. 
For a description of FPU Exceptions, please refer to Chapter 6, 
Floating-Point Exceptions. 

Exception Handling Operation 

The R4000 processes exceptions from a number of sources, including 
TLB misses, arithmetic overflows, I/O interrupts, and system calls. 

When the CPU detects an exception, the normal sequence of 
instruction execution is suspended; the processor exits the current 
mode and enters Kernel mode. The processor then disables interrupts 
and forces execution of a software handler located at a fixed address. 
The handler saves the context of the processor, including the contents 
of the program counter, the current operating mode (User or 
Supervisor), and the status or the interrupts (enabled or dka ble d)- 
This context must be restored when the exception has been handled. 
When an exception occurs, the CPU loads the Exception Program 
Counter (EPO with a restart location where execution may resume 
after the exception has been serviced. The restart location in the EPC 
is the address of the instruction that caused the exception or, if the 
instruction was executing in a branch delay slot, the address of the 
branch instruction immediately preceding the delay slot 
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The Exception Handling Registers 



This section describes the CPO registers that are used in exception 
processing. Software examines the CPO registers during exception 
processing to determine the cause of an exception and the state of the 
CPU at the time of an exception. Each of these registers is described in 
detail in the sections that follow. 
Table 5-1 CPO Registers 



Register Name 


CPO No. 


Context 


4 


BadVAddr (Bad Virtual Address) 


8 


Count 


9 


Compare register 


11 


Status 


12 


Cause 


13 


EPC (Exception Program Counter) 


14 


PRId (Processor Revision Identifier) 


15 


Config 


16 


LLAdr (Load Linked Address) 


17 


WatchLo (Memory Reference Trap Address Low) 


18 


WatchHi (Memory Reference Trap Address High) 


19 


XContext 


20 


ECC 


26 


CacheErr (Cache Error and Status) 


27 


TagLo (Cache Tag) 


28 


TagHi (Cache Tag) 


29 


ErrorEPC (Error Exception Program Counter) 


30 



Two other CPO registers that are part of the virtual memory 
management system and contain important information about 
exception handling are the Index Register (CPO register 0) and the 
Random Register (CPO register 1). These two registers are described in 
Chapter 4. 
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Context Register (CPO Register 4) 



The Context Register is a read/write register containing a pointer to 
an entry in the Page Table Entry (PTE)array. This array is an operating 

system data structure which stores virtual to physical address 
translations. When there is a TLB miss, operating system software 
handles the miss by loading the TLB with the missing translation from 
the PTE array. The Context register is intended for use by the TLB 
refill handler that loads entries for references to a 32-bit address space. 
The Context register duplicates some of the information provided in 
the BadVAddr register, but the information is in a form that is more 
useful for a software TLB exception handler. 
The Context register can be used by the operating system to hold a 
pointer into the PTE array. The operating system sets the PTE base 
field in the register, as needed. Normally, the operating system uses 
the Context register to address the current page map, which resides in 
the kernel-mapped segment teeg3. This register is included solely for 
use of the operating system. 

For all addressing exceptions except bus errors, this register holds the 
Virtual Page Number/2 (VPN2> from the most recent virtual address 
for which the translation was invalid. Figure 5-1 shows the format of 
the Context register. 





Context Register 

*1 » " 


4 3 


D 


32-bit 


PTEBase 


BadVPN2 









9 l» 
53 23 22 


4 
4 3 C 


) 


64-bit 
Mode 


PTEBase 


BadVPN2 





I 


41 


19 


4 


I 



Figure 5-1 Context Register Format 

Bit-field descriptions of the Context register are: 

• The BadVPN2 field is written by hardware on a miss. It 
contains the VPN of the most recently translated virtual 
address that did not have a valid translation. 

• The PTEBase is a read/write field for use by the operating 
system. It is normally written with a value that allow the 
operating system to use the Context register as a pointer 
into the current PTE array in memory. 
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The 19-bit BadVPN2 field contains bits 31..13 of the virtual address that 
caused the TLB miss; bit 12 is excluded because a single TLB entry 
maps to an even-odd page pair. This format can be used directly as an 
address in a table of pairs of 8-byte FTEs, for a 4rKbyte page size. For 
other page and PTE sizes, shifting and masking this value produces an 
appropriate address. 

Bad Virtual Address Register (BadVAddr) (8) 

The Bad Virtual. Address register (BadVAddr) is a read-only register 

that displays the most recently translated virtual address that failed to 

have a valid translation. 

The processor does not write to the BadVAddr register when the EXL 

bit in the Status register is set to a 1. 

Figure 5-2 shows the format of the Bad Virtual Address register. 

Note: The Bad Virtual Address register does not save any information 

for bus errors because they are not addressing errors. 



31 



32-bit 
Mode 



64-bit 
Mode 



63 



BadVAddr Register 



Bad Virtual Address 



32 



Bad Virtual Address 



64 



Figure 5-2 BadVAddr Register Format 

Count Register (9) 

The Count register acts as a timer, incrementing at a constant rate 

whether or not an instruction is executed, retired, or any forward 

progress is made. This register increments at half the maximum 

instruction issue rate. 

This register can be read or written; it can be written for diagnostic 

purposes or system initialization to synchronize two processors 

operating in lock-step. 

Figure 5-3 shows the format of the Count register. 
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Figure 5-3 The Count Register 

Compare Register (11) 

The Compare register implements a timer service (see also the Count 
register) which maintains a stable value and does not change on its 
own. When the value of the Count register equals the value of the 
Compare register, interrupt bit IP 7 in the Cause register is set This 
causes an interrupt to be taken on the next execution cycle in Which 
the interrupt is enabled. Writing a value to the Compare register, as a 
side effect, clears the timer interrupt 

For diagnostic purposes, the Compare register is read/ write. In normal 
use however, the Compare register is only written. Figure 5-4 shows 
the format of the Compare register. 



31 



Compare Register 



Compare 



3 



32 



Figure 54 Compare "Register Format 



R4000 User's Manual-Preliminary 



5-5 



Chapter 5 ■ ______ 

Status Register (12) 

The Status register (SR) is a read/write register that contains the 
operating mode, interrupt enabling, and the diagnostic states of the 
processor. The following list describes Status register fields that are 
used in all R-Series processors; the format of the register is shown in 
Figure 5-5 and Figure 5-6: 

• The Interrupt Mask (M) field is an 8-bit field that controls 
the enabling of eight interrupt conditions. An interrupt is 
taken if interrupts are enabled, and the corresponding bits 
are set in both the Interrupt Mask field of the Status 
register and the Interrupt Pending field of the Cause 
register. For more information, refer to the Interrupt 
Pending (IP) field of the Cause register. 

• The Coprocessor Usability (CU) field is a 4-bit field that 
controls the usability of four possible coprocessors. 
Regardless of the CUO bit setting, CPO is always considered 
usable in Kernel mode. 

• The Diagnostic Status (DS) field is a 9-bit field used for 
self-testing and checking the cache and virtual memory 
system. 

The Reverse Endian (RE) bit bit 25, is used to reverse the endianness 
of the machine in User mode. R-Series processors are configured as 
either little-endian or Big-endian at system reset This selection is 
used in Kernel and Supervisor modes, and in the User mode when the 
RE bit is 0; setting this bit to 1 inverts the selection in User mode. 
Figure 5-5 shows the formats of the Status register. Additional 
information on the Diagnostic Status (DS) field is found in Figure 5-6. 
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The Status Register 

16 15 



8 7 6 5 4 3 2 1 



CU 
(Cu3...CuO) 



RP 



FR 



RE 



DS 



11 



IM 



KXSXUX 



KSU 



ERLEXL 



IE 



1 1 



9 



8 



1 1 1 



1 1 1 



CU Controls the usability of each of the four coprocessor unit numbers (1 -» usable; 

CPO is always usable when in Kernel mode, regardless of the setting of the CU bit. 
RP Enables reduced-power operation by reducing the internal clock frequency 

(0 -» full speed; 1 -» reduced clock). The clock divisor is programmable at boot time. 
FR Enables additional floating-point registers (0 -» 16 registers; 1 -» 32 registers). 
RE Reverse Endian in User mode. 
DS Diagnostic Status field (see Figure 5-6). 
IM Interrupt Mask: controls the enabling of each of the external, into rnal, and I 

software interrupts (0 -» disabled, 1-» enabled). The Interrupt Mask (IM) field is an 
8-bit field that controls the enabling of eight interrupt conditions. An interrupt is 4aken it 
interrupts are enabled, and the corresponding bits are set in both the Interrupt Mask 
field of the Status register and the Interrupt Pending field of the Cause register. 
KX Enables 64-bit addressing in kernel mode. The Extended addressing TLB refill 

exception is used for TLB misses on kernel addresses. (0 -» 32-brt, 1 -» 64-brt) 
SX Enables 64-bit addressing and operations in supervisor mode. The Extended 

addressing TLB refill exception is used for TLB misses on supervisor addresses. 
(0 -» 32-bit, 1 -* 64-bit) 
UX Enables 64-bit addressing and operations in user mode. The Extended addressing 
TLB refill exception is used for TLB misses on supervisor addresses. 
(0 -» 32-bit, 1 -* 64-bit) 
KSU Mode (10 -» User, 01 -* Supervisor, 00 -* Kernel) 
ERL Error Level (0 -» normal, 1 -* error) 
EXL Exception Level (0 -» normal, 1 -» exception) 
IE Interrupt Enable (0 -» disable, 1 -» enable) 
Reserved. Must be written as zeroes, returns zeroes when read. 



Figure 5-5 The Status Register 

The Status register contains a base mode (KSU), base interrupt enable 
(IE), and two modifier bits (EXL and EKL). These bits allow support 
for Supervisor mode as well as rapid TLB refill exceptions for the 
kernel address space. 

Interrupt Enable. Interrupts are enabled when all of the following 
field conditions are true: 

• IE is set to 1 

• EXL is cleared to 
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• ERL is cleared to 

If these conditions are met, interrupts are recognized according to the 
setting of the IM bits. 

Processor Modes. The following R4000 Status register bit settings are 
required for User, Kernel, and Supervisor modes. 

• The processor is in User mode when KSU is equal to 10, 
and EXL is cleared to 0, and ERL is cleared to 0. 

• The processor is in Supervisor mode when KSU is equal to 
01, and EXL is cleared to 0, and ERL is cleared to 0. 

• The processor is in Kernel mode when KSU is equal to 00, 
or EXL is set to 1, or ERL is set to 1. 

32- and 64-bit Modes. The following R4000 Status register bit settings 
select 32- or 64-bit operation for User, Kernel, and Supervisor modes. 
Enabling 64-bit operation permits the execution of 64-bit opcodes and 
translation of 64-bit addresses. 64-bit operation for User, Kernel and 
Supervisor mode may be set independently. 

• 64-bit addressing is enabled for kernel mode when KX is 
set to 1. 64-bit operations are always valid in kernel mode. 

• 64-bit addressing and operations are enabled for 
supervisor mode when SX is set to 1. 

• 64-bit addressing and operations areenabled for user mode 
when UX is set to 1. 

Kernel Address Space Accesses. Access to the Kernel address 
space is allowed when the processor is in kernel mode: 

• KSU is equal to 00. 

• EXL is set to 1. 

• ERL is set to 1. 

Supervisor Address Space Accesses. Access to the Supervisor 
address space is allowed when the processor is in kernel or supervisor 
mode: 

• KSU is not equal to 10 (not in User mode). 

• EXL is set to 1. 

• ERL is set to 1 

User Address Space Accesses. Access to the User address space is 
always allowed. 
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Reset The contents of the Status register are undefined at reset, except 
for the following bits in the Diagnostic Status Field: 

• TS is cleared to 

• ERL and BEV are set to 1 

• SR distinguishes between Reset, and Soft Reset 
(Nonmaskable Interrupt [NMU). 

Figure 5-6 shows the format of the diagnostic status (DS) field, along 
withbit descriptions. All bits in the DS field are read and write, except 
TS. 



The Diagnostic Status Fields 

31 25 24 23 22 21 20 19 18 17 16 15 



BEV 



TS 



SR 



CH 



CE DE 



1 1 11 1 11 



16 



> normal; 



BEV Controls the location of TLB refill and general exception vectors. (0 
1-» bootstrap) 

TS TLB shutdown has occurred (read-only). 

SR A soft reset has occurred. 

CH "Hit" (tag match and valid state) or "miss" indication for last CACHE Hit Invalidate, Hit 
Write Back Invalidate, Hit Write Back, Hit Set Virtual, or Create Dirty Exclusive for a sec- 
ondary cache. 

CE Contents of the ECC register are used to set or modify the check bits of the caches 
when CE equals 1 ; see the ECC register description. 

DE Specifies that cache parity or ECC errors are not to cause exceptions. 
Reserved. Must be written as zeroes, returns zeroes when read 



Figure 5-6 R4000 Status Register DS Field 



Cause Register (13) 



The Cause register is a 32-bit read/ write register. The Cause Register's 
contents describe the cause of the most recent exception. A 5-bit 
exception code (ExcCode) indicates the cause as listed in Table 5-2. The 
remaining fields contain detailed information specific to certain 
exceptions. All bits in the register, with the exception of the IP(1..0) 
bits, are read-only. IP(1..0) bits are used for software interrupts. Table 
5-2 shows a decoding of the 5-bit Exception Code field, and 
Figure 5-7 shows the format of the Cause Register. 
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The Cause Register 

16 15 



8 7 6 



2 1 



BD 



CE 



11 



IP 



Exc 
Code 



1 1 
BD 

CE 

IP 



2 12 8 l o z 

Indicates whether or not the last exception was taken while executing in a branch 
delay slot. (1-» delay slot; -» normal). 

Indicates the coprocessor unit number referenced when a Coprocessor 
Unusable exception is taken. 



Indicates whether an interrupt is pending. 
ExcCode This is the exception code field. 
Reserved. Must be written as zeroes, returns zeroes when read. 



Figure 5-7 Cause Register Format 
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Table 5-2 The ExcCode Field of Cause Register 



Exception 
Code Value 


Mnemonic 


Description 


1" 


Int 


Interrupt 


1 


Mod 


TLB modification exception 


2 


TLBL 


TLB exception (load or instruction fetch) 


3 


TLBS 


TLB exception (store) 


4 


AdEL 


Address error exception (load or instruction fetch) 


5 


AdES 


Address error exception (store) 


6 


IBE 


Bus error exception (instruction f etch) 


7 


DBE 


Bus error exception (data reference: load or store) 


8 


Sys 


Syscall exception 


9 


Bp 


Breakpoint exception 


10 


RI 


Reserved instruction exception 


11 


CpU 


Coprocessor Unusable exception 


12 


Ov 


Arithmetic Overflow exception 


13 


Tr 


Trap exception 


14 


VCEI 


Virtual Coherency Exception Instruction 


15 


FPE 


Floating-Point exception 


16-22 


- 


Reserved 


23 


WATCH 


Reference to WatchHi/WatchLo address 


24-30 


- 


Reserved 


31 


VCED 


Virtual Coherency Exception Data 



The R4000 processor has eight interrupts, IP(7:0), which are used as 
follows: 

• IP(7..2h Reading the Cause register returns the inclusive OR 
of two internal registers for interrupts IP(6.2). One of the 
internal registers is latched each cycle from the interrupt 
pins on the R4000; the other register is read and written by 
commands on the system interface port. On reset, IP(7) is 
configured as either a sixth external interrupt, or an 
internal interrupt that is set when the Count register is 
equal to the Compare register. 

• IP(1..0) are software-only interrupts, and can be written to 
set or reset software interrupts. 
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Floating-point exceptions use a separate exception code contained in 
the Floating-Point Control and Status Registers (see Chapter 6). 



Exception Program Counter (EPC) Register (14) 

The Exception Program Counter (EPO is a read-write register that 
contains the address where processing resumes after an exception has 
been serviced. 
For synchronous exceptions, the EPC register contains either: 

• the virtual address of the instruction that was the direct 
cause of the exception, or 

• the virtual address of the immediately preceding branch or 
jump instruction (when the instruction is in a branch delay 
slot, and the Branch Delay bit in the Cause register is set). 

The EPC register is read/write. 

The processor does not write to the EPC register when the EXL bit in 

the Status register is set to a 1. 

The format of the EPC register is shown in Figure 5-8. 





31 


The EPC Register 







32-bit 
Mode 


EPC 






63 


32 







64-bit 

Mode 


EPC 






EPC 


64 
Address where processing is to resume. 







Figure 5-8 EPC Register Format 



5-12 



R4000 User's Manual-Preliminary 



Exception Processing 



Processor Revision Identifier (PRId) Register (15) 

The Processor Revision Identifier (PRId) is a 32-bit, read-only register 
that contains information identifying the implementation and 
revision level of the GPU and CPO. Figure 5-9 shows the format of the 
PRId register. 



31 



PRId Register 

1615 



87 



imp 



Rev 



16 



8 



8 



Imp Implementation number. 
Rev Revision number. 
Reserved. Must be written as zeroes, returns zeroes when read. 



Figure 5-9 Processor Envision Identifier Register Format 

The low-order byte (bits 7..0) of the PRId register is interpreted as a 
coprocessor unit revision number, and the second byte (bits 15..8) is 
interpreted as a coprocessor unit implementation number. The R4000 
coprocessor implementation number is 0x04. The contents of the high- 
order halfword of the register are reserved. 
The revision number is a value of the form y x, where y is a major 
revision number in bits 7. A and a: is a minor revision number in bits 
3..0. 

The revision number can distinguish some chip revisions. However, 
MIPS does not guarantee that changes to its chips will necessarily be 
reflected in the PRId register, or that changes to the revision number 
necessarily reflect real chip changes. For this reason these values are 
not listed and software should not rely on the revision number in the 
PRId register to characterize the chip. 
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Config Register (16) 



The Config register specifies various configuration options selected on 
R4000 processors. Some configuration options, as defined by Config 
bits 31..6, are set by the hardware during reset, and are included in this 
register as read-only status for software. Other configuration options 
are read/ write (defined by Config bits 5..0) and controlled by software; 
on reset these fields are undefined. 

The Config register should be initialized by software before caches are 
used. The caches should be completely written back to memory before 
changing block sizes, and reinitialized after any change is made. 
Figure 5-10 shows the format of the Config register and Table 5-3 lists 
the field and bit definitions for the Config Register. 



The Config Register 
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BE 
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IC 


DC 
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K0 
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Figure 5-10 Config Register Format 
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Field/Bit 
Name 



CM 



EC 



EP 



SS 



SW 



EW 



SC 



SM 



BE 



EM 



EB 



IC 



DC 



IB 



DB 



CU 



KO 
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Table 5-3 Conftg Register Field and Bit Definitions 



Description 



Master-Checker Mode (1 
ically on a Soft Reset 



. Master-Checker Mode is enabled) This bit is automat- 



System clock ratio: 

-» processor clock frequency divided by 2 

1 -» processor clock frequency divided by 3 

2 -> processor clock frequency divided by 4 



Transmit data pattern (pattern for write-back data): 

-* D Doubleword every cycles 

1 -» DDx 2 Doublewords every 3 cycles 

2 -» DDxx 2 Doublewords every 4 cycles 

3 -» DxDx 2 Doublewords every 4 cycles 

4 -» DDxxx 2 Doublewords every 5 cycles 

5 -* DDxxxx 2 Doublewords every 6 cycles 

6 -» DxxDxx 2 Doublewords every 6 cycles 

7 -» DDxxxxx 2 Doublewords every 7 cycles 

8 -» DxxxDxxx 2 Doublewords every 8 cycles 



SB Secondary Cache block size: 

0^4 words 
1 -» 8 words 
2-» 16 words 
3 -» 32 words 



Split Secondary Cache Mode (0 -» instruction and data mixed in secondary cache; 
1 -» instruction and data separated by SCAddr 17 ) 



Secondary Cache port width (0 -» 128-bit data path to SCache; 1 
System Port width (0 -» 64-bit; 1 -» 324J) ~ 



.64-bit) 



Secondary Cache present (0 -» SCache present; 1 -> no SCache present) 
Dirty Shared coherency state; 1 -> then Dirty Shared state is disabled; 
-» enabled 



BigEndianMem (1 -» then kernel and memory are Big Endian, -» Little Endian) 



ECC mode enable (0 -> ECC mode enabled;1 -» parity mode enabled) 



Block ordering (0 -» then sequential.l -> sub-block) 



Reserved. Must be written as zeroes, returns zeroes when read- 



Primary ICache Size (ICache size - 2 12+IC bytes 
Primary DCache Size (DCache size - 2 1 2 * DC bytes) 



Primary ICache line size (1 -> 32 bytes; -» 16 bytes) 



Primary DCache line size (1 -» 32 bytes; -» 16 bytes) 



Update on Store Conditional (0 -* Store Conditional uses coherency algorithm 
specified by TLB; 1 -> SC uses cacheable coherent update on write) 



ksegO coherency algorithm (see EntryLoO and EntryLol Registers) 
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Load Linked Address (LLAddr) Register (17) 

The Load Linked Address (LLMir) register is an R4000 read/write 
coprocessor register that contains the physical address read by the 
most recent Load Linked instructioa This register is used only for 
diagnostic purposes, and serves no function during normal operation. 
Figure 5-11 shows the format of the LLAddr register; PAddr represents 
bits 35..4 of the physical address. 



The LLAdr Register 



31 



PAddr(35„.4) 



32 



Figure 5-11 LLAdr Register Format 

WatchLo (18) and WatchHi (19) Registers 

R4000 processors provide a debugging feature to detect references to 
a selected physical address; load and store operations to the location 
specified by the WatchLo and WatchHi registers cause a Watch 
exception (described later in this chapter). Figure 5-12 shows the 
format of the WatchLo and WatchHi registers. 

XContext Register (CPO Register 20) 

The XContext Register is a read/ write register containing a pointer to 
an entry in the Page Table Entry (PTE) array. This array is an operating 
system data structure which stores virtual to physical address 
translations. When there is a TLB miss, operating system software 
handles the miss by loading the TLB with the missing translation from 
the PTE array. The XContext register is intended for use with the XTLB 
refill handler, which loads TLB entries for references to a 64-bit 
address space. 

The XContext register duplicates some of the information provided in 
the BadVAddr register, but the information is in a form that is more 
useful for a software TLB exception handler. 
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31 



PAddn 
PAddrO 
R 

w 





The WatehLo Register 



PAddrO 



W 



29 



1 1 



The WatchHi Register 



4 3 



PAddn 



28 



Bits 35..32 of the physical address. 

Bits 31 ..3 of the physical address. 

Trap on load references if set to1 . 

Trap on store references if set to1 . 

Reserved. Must be written as zeroes, returns zeroes when read. 



Figure 5-12 WatehLo and WatchHi Register Formats 

The XContext register can be used by the operating system to hold a 
pointer into the PTE array. The operating system sets the PTE base 
field in the register, as needed. Normally, the operating system uses 
the Context register to address the current page map, which resides in 
the kernel-mapped segment kseg3. This register is included solely for 
use of the operating system. 

For all addressing exceptions except bus errors, this register holds the 
Virtual Page Number/2 (VPN2) from the most recent virtual address 
for which the translation was invalid. Figure 5-13 shows the format of 
the XContext register. 



63 


XContext Register 

33 32 31 30 


4 


3 


PTEBase 


R 


BadVPIM2 







31 


2 


27 




4 



Figure 5-13 XContext Register Format 
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Bit-field descriptions of the XQmtext register are: 

• The BadVPN2 field is written by hardware on a miss. It 
contains the VPN of the most recently translated virtual 
address that did not have a valid translation. 

• R is the Region: 00-»user, 01-»supervisor, ll-»kemel. 
These are bits 63..62 of the virtual address. 

• The FTEBase is a read/ write field for use by the operating 
system. It is normally written with a value that allows the 
operating system to use the Context register as a pointer 
into the current PTE array in memory. 

The 27-bit BadVPN2 field contains bits 39..13 of the virtual address that 
caused the TLB miss; bit 12 is excluded because a single TLB entry 
maps to an even-odd page pair. This format can be used directly as an 
address in a table of pairs of 8-byte PTEs, for a 4-Kbyte page size. For 
other page and PTE sizes, shifting and masking this value produces an 
appropriate address. 

Error Correction Code (ECC) Register (26) 

The Error Correction Code (ECC) register is an 8-bit read/write 
register; it reads and writes either secondary-cache data ECC bits or 
primary-cache data parity bits, for cache initialization, cache 
diagnostics, or cache error handling. (Tag ECC and parity are loaded 

from and stored to the TagLo register.) 

The ECC register is loaded by the CACHE operation Index Load Tag. 

Itis: 

• written into the primary data cache on store instructions 
(instead of the computed parity) when the CE bit of the 
Status register is set, 

• substituted for the computed instruction parity for the 
CACHE operation Fill, or 

• XORed into the computed ECC for the secondary cache for 
the following primary data cache CACHE operations: 
Index Write Back Invalidate, Hit Write Back, and Hit Write 
Back Invalidate. 

Figure 5-14 shows the format of the ECC register. 
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The ECC Register 



31 



8 7 



ECC 



3 



24 



ECC 




An 8-bit fleW specifying the ECC bits read from or written to a 
secondary cache, or the even byte parity bits to be read from or 
written to a primary cache. 
Reserved. Must be written as zeroes, returns zeroes when read. 



Figure 5-14 ECC Register Format 

Cache Error Register (27) 

The CacheErr register is a 32-bit read-only register that handles ECC 
errors in the secondary cache and parity errors in the rnimary cache. 
Parity errors cannot be corrected. All single- and double-bit ECC 
errors in the secondary cache tag and data are detected and single-bit 
errors in the tag are automatically corrected. Single-bit ECC errors in 
the secondary cache data are not automatically corrected. 

The CacheErr register provides cache index and status bits which 
indicate lite source and nature of the error; it is loaded when a Cache 
Error exception is taken. Figure 5-15 shows the format of the CacheErr 
register. 
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ER 



The CacheErr Register 



31 30 29 28 27 26 25 24 23 22 21 



EC 



ED 



ET 



ES 



EE 



EB 



El 



Sldx 



PIDx 



11111111 



19 



ER 
EC 
ED 
ET 
ES 

EE 
EB 

El 

Sldx 

Pldx 




Type of reference (0 -> instruction, 1 -» data). 

Cache level of the error (0 -» primary, 1 -» secondary). 

Indicates whether a data field error occurred (0 -> no error, 1 -» error). 

Indicates whether a tag field error occurred (0 -» no error, 1 -» error). 

Indicates that the error occurred while accessing primary or secondary cache 

in response to an external request (0 -» internal reference, 1 -» external 

reference). 

Set if the error occurred on the SysAD bus. 

Set if a data error occurred in addition to the instruction error (indicated by the 

remainder of the bits), which requires flushing the data cache after fixing tne 

instruction error. 

Set on a secondary data cache ECC error while refilling the primary cache on 

a store miss. The ECC handler must first do an Index store Tag to invalidate 

the incorrect data from the primary data cache. 

Bits oAddr(21 ..3) of the reference that encountered the error (which is not 

necessarily the same as the address of the doubleword in error, but is sufficient 

to locate that doubleword in the secondary cache). 

Bits vAddr(1 4..12) of the doubleword in error (used with Sldx to construct a 

virtual index for the primary caches). 

Reserved. Must be written as zeroes, returns zeroes when read. 



Figure 5-15 CacheErr Register Format 
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Cache Tag (TagLo, and TagHi) (28) (29) Registers 

The TagLo and TagHi registers are 32-bit read/write registers that hold 
either the primary cache tag and parity, or the secondary cache tag 
and ECC during cache initialization, cache diagnostics, or cache error 
handling. The Tag registers are written by the CACHE and MTCO 
instructions. 

The P and ECC fields of these registers are ignored on Index Store Tag 
operations. Parity and ECC are computed by the store operation. 
Figure 5-16 shows the format of these registers for primary cache 
operations. 




TagLo 



TagHi 



Figure 5-16 TagLo and TagHi Register (P-Cacne) Formats 

Figure 5-17 shows the format of these registers for secondary cache 
operations. 
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The TagLo and TagHi Registers (S-Cache) 

13 12 10 9 



7 6 



STagLo 



SState 



Vlndex 



ECC 
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31 



32 
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Figure 5-17 TagLo and TagHi Register (S-Cache) Formats 
Bit definitions of the TagLo and TagHi registers are given in Table 5-4. 
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Table 5-4 The Cache Tag Field and Bit Definitions 



Bit Name 



PTagLo 

PState 

P 

STagLo 

SState 

Vlndex 

ECC 




Description 



A 24-bit field specifying the physical address bits 35..12. 

A 2-bit field specifying the primary cache state. 

A 1-bit field specifying the primary tag even parity bit. 

A 19-bit field specifying the physical address bits 35..17. 

A 3-bit field specifying the secondary cache state. 

A 3-bit field specifying the virtual index of the associated 

primary cache line, vAddr(14..12). 

ECC for the STag, SState, and Vlndex fields. 

Reserved. Must be written as zeroes, returns zeroes when 

read. 



Error Exception Program Counter (Error EPC) Register (30) 

The ErrorEPCregisterissimilartotheEPCregister/butisused onECC 
arid parity error exceptions. It is also used to store the PC on Reset, 
Soft Reset, and NMI exceptions. The read/ write ErrorEPC register 
contains the virtual address at which instruction processing can 
resume after servicing an error. The address may be either: 

• the virtual address of the instruction that caused the 
exception, or 

• the virtual address of the immediately preceding branch or 
jump instruction when that address is in a branch delay 
slot. 

There is no branch delay slot indication for the ErrorEPC register. 
Figure 5-18 shows the format of the ErrorEPC register. 



31 



63 



The ErrorEPC Register 



ErrorEPC 



32 



ErrorEPC 



64 



ErrorEPC Error Exception Program Counter 



Figure 5-18 ErrorEPC Register Format 
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Exception Description Details 

This section describes each of the R4000 exceptions — its cause, 
handling, and servicing. 

Exception Operation 

To handle an exception, the processor forces execution of a handler, at 
a fixed address in Kernel mode, with interrupts disabled. To resume 
normal operation, the Program Counter (PC), operating mode, and 
interrupt enable must be restored; thus it is this context that must be 
saved when an exception is taken. 

When an exception occurs, the EPC register is loaded with the restart 
location at which execution can resume after servicing the exception. 
The EPC register contains the address of the instruction that caused 
the exception; or, if the instruction was executing in a branch delay 
slot, the EPC register contains the address of the branch instruction 
immediately preceding. 

R4000 processors use the following mechanisms for saving and 
restoring the operating mode and interrupt status: 

• A single interrupt enable bit (IE) located in the Status 
Register. 

• A base operating mode (User, Supervisor, Kernel) located 
in KSU of the Status Register. 

• An exception level (normal, exception) located in EXL of 
the Status Register. 

• An error level (normal, error) located in ERL of the Status 
Register. 

Interrupts are enabled by setting the IE bit to 1 and both levels 
(exception and error) to normal. 

When the EXL bit in the Status register is zero, the User or Supervisor 
operating mode is specified by the KSU bits in the Status register. 
When the EXL bit is one, the processor is in Kernal mode and 
exceptions set the EXL bit to one. The exception handler typically 
resets the EXL bit to zero after saving the appropriate state, and then 
sets the EXL bit back to one while restoring the state and restarting. 
Returning from an exception (See the ERET instruction in Appendix 
A), resets the EXL bit to zero. 
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Hgure 5-19 shows the R4000 reset exception. 



T: undefined 

Random <- TLBENTRIES-1 

Wired <— 

Config <- CM || EC || EP || SB || SS \\ SW || EW || SC || SM || BE || EM || EB || || 001 

|| 001 1| undefined 6 
ErrorEPC <- PC 

SR «- SR31.53 11 1 1I 1| 1| SR 19 „ 3 11 1 1I SR1..0 
PC <- OxBFCO 0000 



Figure 5-19 R4000 Reset Exception 

Figure 5-20 shows the R4000 Soft Reset and NMI exception. 



T: ErrorEPC <- PC 

SR <- SR 31 & || 1 || || 1 || SR 19 „ 3 1| 1 || SRl.0 
PC <- OxBFCO 0000 



Figure5-20 R4000 Soft Reset and NMI Exception 

Figure 5-21 shows the R4000 exceptions except Reset, Soft Reset, NMI, 
and Cache Error. 



T: Cause f- BD || 1| CE || 0" || Cause 15 „ 8 II 1| ExcCode || 0* 



ifSR^Othen 

EPC <- PC 
endif 

SR<-SR 3 i^||1||SR 
ifSR22 = 1then 

PC <- OxBFCO 0200 + vector 
else 

PC <- 0x8000 0000 + vector 
endif 



Figure 5-11 R4000 Exceptions (Except Reset, Soft Reset, NMI, and Cache Error) 
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Figure 5-22 shows the Cache Error exception. 



T: ErrorEPC <- PC 9 

CacheErr «- ER || EC || ED || ET || ES || EE || EB || El || 2 1| Sldx || Pldx 

SR<-SR 3 i„3lMI|SRi..o 
if SR22 • 1 then 

PC <- OxBFCO 0200 + 0x100 
else 

PC i- OxAOOO 0000 + 0x100 
endif 



Figure 5-22 R4000 Cache Error Exception 
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Exception Vector Locations 



The Reset, Soft Reset, and NMI exception s are always vectored to 
location OxBFCO 0000 in 32-bit mode or OxFFFF FFFF BFCO 0000 in 
64-bit mode. Addresses for other exceptions are a combination of a 
vector offset and a base address, determined by the BEV bit of the Status 
register. Table 5-5 shows the Vector Base addresses, and Table 5-6 
shows the Vector Offset to these addresses. 
Table 5-5 Exception Vector Base Addresses 



BEV 


R4000 
Vector Base 


32-bit mode 


64-bit mode 





0x8000 0000 


OxFFFF FFFF 8000 0000 


1 


Ox BFCO 0200 


OxFFFF FFFF BFCO 0200 



The vector base for the Cache Error exception is ixiksegl (OxAOOO 0000 

in 32-bit mode, OxFFFF FFFF A000 000 in 64-bit mode) instead of 

ksegO (0x8000 0000 in 32-bit mode, OxFFFF FFFF 8000 0000 in 64-bit 

mode) when BEV is 0. This indicates that the caches are initialized and 

that the vector may be cached. 

When BEV is set to a 1, vector base for the Cac he Error exception is 

OxBFCO 0200 in 32-bit mode and OxFFFF FFFF BFCO 0200 in 64-bit 

mode which is uncached and unmapped. This vector does not rely on 

proper cache operation 

Table 5-6 Exception Vector Offset Addresses 



Exception 


- 

R4000 
Vector Offset 


TLB refill, EXL - 


0x000 


XTLB refill, EXL - 


0x080 


Cache Error 


0x100 


Others 


0X180 
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Priority of Exceptions 

While more than one exception can occur for a angle instruction, only 

one exception is reported, with priority given in the order shown in 

Table5-7. 

Table 5-7 Exception Priority Order 



Reset 

Soft Reset 

NMI 

Address error — Instruction fetch 

TLB refill — Instruction fetch 

TLB invalid — Instruction fetch 

Cache error — Instruction fetch 

Virtual Coherency — Instruction fetch 

Bus error — Instruction fetch 

Integer overflow, Trap, System call, Breakpoint, 
Reserved Instruction, Coprocessor Unusable, 
or Floating-Point Exception 

Address error — Data access 

TLB refill — Data access 

TLB invalid — Data access 

TLB modified — Data write 

Cache error — Data access 

Watch 

Virtual Coherency — Data access 

Bus error — Data access 

Interrupt 



R4000 User's Manual-Preliminary 



5-27 



Chapters ■ 

Reset Exception 

Cause. The Reset exception occurs when the ColdReset* signal is 
asserted and then deasserted. This exception is not maskable. 
Handling. The CPU provides a special interrupt vector (OxBFCOOOOO) 
for this exception. The Reset vector resides in unmapped and 
uncached CPU address space; therefore the hardware need not 
initialize the TLB or the cache to handle this exception. The processor 
can fetch and execute instructions while the caches and virtual 
memory are in an undefined state. 

The contents of all registers in the CPU are undefined when this 
exception occurs except for the following: 

• In the Status register, SR and TS are cleared to 0, and ERL 
and BEV are set to 1. Other bits are undefined. 

• The Random register is initialized to the value of its upper 
bound (see the Random register for more information). 

• The Wired register is initialized to 0. 

Servicing. The Reset exception is serviced by initializing all processor 
registers, coprocessor registers, caches, and the memory system; by 
performing diagnostic tests; and by bootstrapping the operating 
system. 

The Reset exception vector is located in the uncached, unmapped 
memory space of the machine so that instructions can be fetched and 
executed while the cache and virtual memory system are still in an 
undefined state. 

Soft Reset Exception 

Cause. The Soft Reset exception occurs in response to the Reset* input 
signal, and execution begins at the Reset vector when Reset* is 
deasserted. This exception is not maskable. 
Handling. The Reset exception vector (OxBFCO 0000) is used for this 
exception, located within unmapped and uncached address space so 
that the cache and TLB need not be initialized to handle this exception. 
The SR bit of the Status register is set to distinguish this exception from 
a Reset exception 

The primary purpose of the Soft Reset exception is to reinitialize the 
processor after a fatal error such as a Master/Checker mismatch. 
Unlike an NMI, all cache and bus state machines are reset by this 
exception; like Reset, it can be used on the processor in any state. The 
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caches, TLB, and normal exception vectors need not be properly 

initialized. 

The contents of all registers are preserved when this exception occurs, 

except for 

• The ErrorEPC register, which contains the restart PC. 

• The ERL bit of the Status register, which is set to 1 . 

• The SR bit of the Status register, which is set to 1. 

• The BEV bit of the Status register, which is set to 1. 

Because the Soft Reset can abort cache and bus operations, cache and 
memory state is undefined when this exception occurs. 
Servicing. The Soft Reset exception is serviced by saving the current 
processor state for diagnostic purposes, and reinitializing for the Reset 
exception. 

NonMaskable Interrupt (NMI) Exception 

Cause. The NonMaskable Interrupt (NMI) exception occurs in 

response to the falling edge of the NMI pin, or an external write to the 

Int*[6] bit of the Interrupt register. As the name describes, this 

exception is not maskable; it occurs regardless of the settings of the 

EXL, ERL, and the IE Status register bits. 

Handling. The Reset exception vector (OxBFCO 0000) is also used for 

this exception. This vector is located within unmapped and uncached 

address space so that the cache and TLB need not be initialized to 

handle an NMI interrupt. The SR bit of the Status register is set to 

differentiate this exception from a Reset exception. 

Because an NMI could occur in the midst of another exception, in 

general it is not possible to continue program execution after servicing 

an NMI. 

Unlike Reset and Soft Reset, but like other exceptions, NMI is taken 

only at instruction boundaries. The state of the caches and memory 

system are preserved by this exception. 

The contents of all registers are preserved when this exception occurs, 

except for 

• The ErrorEPC register, which contains the restart PC. 

• The Ei?L bit of the Stat us register, which is set to 1. 

• The SR bit of the Status register, which is set to 1. 

• The BEV bit of the Status register, which is set to 1. 
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Servicing. The NMI exception is serviced by saving the current 
processor state for diagnostic purposes, and reinitializing the system 
for the Reset exception. 

Address Error Exception 

Cause. The Address Error exception occurs when an attempt is made 
to: 

• Load, fetch, or store a word that is not aligned on a word 
boundary. 

• Load or store a halfword that is not aligned on a halfword 
boundary. 

• Load or store a doubleword that is not aligned on a 
doubleword boundary. 

• Reference the kernel address space from User or 
Supervisor mode. 

• Reference the Supervisor address space from User mode. 
This exception is not maskable. 

Handling. The common exception vector is used for this exception. 

The AdEL or AdES code in the Cause register is set, indicating whether 

the instruction (as shown by the EPC register and BD bit in the Cause 

register) caused the exception with an instruction reference, load 

operation, or store operation. 

When this exception occurs, the BadVAddr register retains the virtual 

address that was not properly aligned or which referenced protected 

address space. The contents of the VPN field of the Context and 

EntryHi registers are undefined, as are the contents of the EntryLo 

register. 

The EPC register points at the instruction that caused the exception, 

unless this instruction is in a branch delay slot If in a branch delay 

slot, the EPC register points at the preceding branch instruction and 

the BD bit of the Cause register is set as indication. 

Servicing. The process executing at the time is handed a UNIX 

SIGSEGV (segmentation violation) signal. This error is usually fatal to 

the process incurring the exception 
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TLB Exceptions 

There are three different types of TLB exceptions than can occur 

• TLB Refill occurs when there is no TLB entry to match a 
reference to a mapped address space. 

• TLB Invalid occurs when a virtual address reference 
matches a TLB entry that is marked invalid. 

• TLB Modified occurs when a store operation virtual 
address reference to memory matches a TLB entry which is 
marked valid but is not dirty/writable. 

TLB Refill Exception 

Cause. The TLB refill exception occurs when there is no TLB entry to 

match a reference to a mapped address space. This exception is not 

maskable. 

Handling. Two special exception vectors are provided for this 

exception; one for references to 32-bit address spaces, and one for 

references to 64-bit address spaces. The UX, SX, and KX bits of the 

Status register determine whether the user, supervisor or kernel 

address spaces are 32-bit or 64-bit spaces. All references use these 

vectors when the EXL bit is set to in the Status register. 

The TLBL or TLBS code in the Cause register is set This code indicates 

whether the instruction, as shown by the EPC register and the BD bit 

in the Cause register, caused the miss by an instruction reference, load 

operation, or store operation. 

When this exception occurs, the BadVAddr, Context, XContext and 

EntryHi registers hold the virtual address that failed address 

translation The EntryHi register also contains the ASID from which 

the translation fault occurred. The Random register normally contains 

a valid location in which to place the replacement TLB entry. The 

contents of the EntryLo register are undefined. 

The EPC register points at the instruction that caused the exception, 

unless this instruction is in a branch delay slot, in which case the EPC 

points at the preceding branch instruction and the BD bit of the Cause 

register is set. 

Servicing. To service this exception, the contents of the Context or 

XContext register are used as a virtual address to fetch memory 

locations containing the physical page frame and access control bits 

for a pair of TLB entries. The two entries are placed into the EntryLoO/ 

EntryLol register, and the EntryHi and EntryLo registers are written 

into the TLB. 
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It is possible that the virtual address used to obtain the physical 
address and access control information is on apage that is not resident 
in the TLB. This is handled by allowing a TLB refill exception in the 
TLB refill handler. This second exception goes instead to the common 
exception vector because the EXL bit of the Status register is set 

TLB Invalid Exception 

Cause. The TLB invalid exception occurs when a virtual address 

reference matches a TLB entry that is marked invalid (TLB valid bit 

cleared). This exception is not maskable. 

H andling. The common exception vector is used for this exception. 

The TLBL or TLBS code in the Cause register is set This code indicates 

whether the instruction, as shown by the EPC register and BD bit in 

the Cause register, caused the miss by an instruction reference, load 

operation, or store operation. 

When this exception occurs, the BadVAddr, Context, XQmtext and 

EntryHi registers contain the virtual address that failed address 

translation. The EntryHi register also contains the ASID from which 

the translation fault occurred. The Random register normally contains 

a valid location in which to put the replacement TLB entry. The 

contents of the Entrylo register are undefined. 

The EPC register points at the instruction that caused the exception 

unless this instruction is in abranch delay slot, in which case the EPC 

points at the preceding branch instruction and the BD bit of the Cause 

register is set 

Servicing. The valid bit of a TLB entry is typically cleared when: 

• A virtual address does not exist 

• The virtual address exists, but is not in main memory (a 
page fault). 

• A trap is desired on any reference to the page (for example, 
to maintain a reference bit). 

After servicing the cause of this exception, the TLB entry is located 
withTLBP (TLB Probe), and replaced by an entry with its valid bit set. 
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TLB Modified Exception 

Cause. The TLB modified exception occurs when a store operation 

virtual address reference to memory matches a TLB entry which is 

marked valid but is not dirty/ writable. This exception is not 

maskable. 

Ha nd ling. The common exception vector is used for this exception, 

and the Mod code in the Cause register is set 

When this exception occurs, the BadVAddr, Context, XContext and 

EntryHi registers contain the virtual address that failed address 

translation. The EntryHi register also contains the ASID from which 

the translation fault occurred. The contents of the EntryLo register are 

undefined. 

The EPC register points at the instruction that caused the exception 

unless this instruction is in a branch delay slot, in which case the EPC 

points at the preceding branch instruction and the BD bit of the Cause 

register is set 

Servicing. The kernel uses the failed virtual address or virtual page 

number to identify the corresponding access control information. The 

page identified may or may not permit write accesses; if writes are not 

permitted, a Write Protection Violation has occurred. 

If write accesses are permitted, the page frame is marked dirty/ 

writable by the kernel in its owh data structures. The TLBP instruction 

is used to place the index (of the TLB entry that must be altered) into 

the Index register. The EntryLo register is loaded with a word 

co ntain ing the physical page frame and access control bits (with the D 

bit set), and the EntryHi and EntryLo registers are written into the TLB. 

Cache Error Exception 

Cause. The Cache Error exception occurs when either a secondary 

cache ECC error or primary cache parity error is detected. This 

exception is not maskable (however error detection can be disabled by 

the DE bit of the Status register). 

Handling. The processor sets the EKL bit in the Status register, saves 

the exception restart address in ErrorEPC register, and then transfers 

to a special vector in uncached space: OxAOOO 0100 in 32-bit mode and 

OxFFFF FFFF AOOO 0100 in 64-bit mode if the BEV bit is 0, otherwise 

OxBFCO 0300 in 32-bit mode and OxFFFF FFFF BFC0 0300 in 64-bit 

mode. 

No other registers are changed. 

Servicing. All errors should be logged. 
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Single-bit ECC errors in the secondary cache can be corrected, using 
the CACHE instruction, and execution resumed through ERET. 
Cache parity errors and non-single-bit ECC errors in unmodified 
cache blocks can be corrected by using the CACHE instruction to 
invalidate the cache block, then overwriting the old data through a 
cache miss and resuming execution with ERET. Other errors are not 
correctable, and are likely to be fatal to the current process. 

Virtual Coherency Exception 

Cause. The Virtual Coherency exception occurs when a primary cache 
miss hits in the secondary cache, but bits 14..12 of the virtual address 
were not equal to the corresponding bits of the PIdx field of the 
secondary cache tag, and the cache algorithm for the page (from the C 
field in the TLB) specifies that the page is cached. This exception is not 
maskable. 

Handling. The common exception vector is used for this exception. 
The VCEI or VCED code in the Cause register is set for instruction and 
data cache misses respectively. The BadVAMr register holds the 
virtual address that caused the exception. 
Servicing. The CACHE instruction can determine the old virtual 
index, remove the data from the primary caches at the old virtual 
index, and write the PIdx field of the secondary cache with the new 
virtual index. At this point, the program can be continued. 
Software can avoid the cost of this trap by using consistent virtual 
primary cache indexes to access the same physical data. 

Bus Error Exception 

Cause. The Bus Error exception occurs when signaled by board-level 
circuitry for events such as bus time-out, backplane bus parity errors, 
and invalid physical memory addresses or access types. This 
exception is not maskable. 

Bus Error occurs only when a cache miss refill, uncached reference, or 
unbuffered write occurs synchronously; a Bus Error resulting from a 
buffered write transaction must be reported using the general 
interrupt mechanism. 

Handling. The common interrupt vector is used for a Bus Error 
exception. The 1BE or DBE code in the Cause register is set, signifying 
whether the instruction (as indicated by the EPC register and BD bit in 
the Cause register) caused the exception by an instruction reference, 
load operation, or store operation 
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The EPC register points at the instruction that caused the exception, 
unless it is in a branch delay slot in which case the EPC points at the 
preceding branch instruction and the BD bit of the Cause register is set 
Servicing. The physical address at which the fault occurred can be 
computed from information available in the system control 
coprocessor registers. 

• If the IBE code in the Cause register is set (indicating an 
instruction fetch reference), the virtual address is contained 
in the EPC register. 

• If the DBE code is set (indicating a load or store reference), 
the instruction which caused the exception is located at the 
virtual address contained in the EPC register (or four plus 
the contents of the EPC register if the BD bit of the Cause 
register is set). 

The virtual address of the load and store reference can then be 
obtained by interpreting the instruction. The physical address can be 
obtained by using the TLBP instruction and reading the Entrylo 
register to compute the physical page number. 
The process executing at the time of this exception is handed a UNIX 
SIGBUS (bus error) signal, which is usually fatal. 

Integer Overflow Exception 

Cause. The Integer Overflow exception occurs when an ADD, ADDI, 
SUB, DADD, DADDI or DSUB instruction results in a 2's-complement 

overflow. This exception is not maskable. 

Handling. The common exception vector is used for this exception. 

The OV code in the Cause register is set 

The EPC register points at the instruction that caused the exception 

unless the instruction is in a branch delay slot, in which case the EPC 

points at the preceding branch instruction and the BD bit of the Cause 

register is set 

Servicing. The process executing at the time of the exception is 

handed a UNIX SIGFPE/FPEJNTOVFJTRAP (floating-point 

exception/integer overflow) signal. This error is usually fatal to the 

current process. 
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Trap Exception 

Cause. The Trap exception occurs when a TGE, TGEU, TLT, TLTU, 
TEQ, TNE, TGEI, TGEUI, TLTI, TLTUI, TEQI, or TNH instraction 
results in a TRUE condition. This exception is not maskable. 
Handling. The common exception vector is used for this exception, 
and the Tr code in the Cause register is set 

The EPC register points at the instruction causing the exception unless 
the instruction is in a branch delay slot, in which case the EPC points 
at the preceding branch instruction and the BD bit of the Cause register 
is set 

Servicing. The process executing at the time of a Trap exception is 
handed a UNIX SIGFPE/FPE_INTOVF_TRAP (floating-point 
exception/integer overflow) signal. This error is usually fatal. 

System Call Exception 

Cause. The System Call exception occurs on an attempt to execute the 

SYSCALL instruction. This exception is not maskable. 

Handling. The common exception vector is used for this exception. 

The Sys code in the Cause register is set 

The EPC register points at the SYSCALL instruction unless it is in a 

branch delay slot, in which case the EPC points at the preceding 

branch instruction. 

If the SYSCALL instruction is in a branch delay slot, the BD bit of the 

Status register is set; otherwise this bit is cleared. 

Servicing. When this exception occurs, control is transferred to the 

applicablesystem routine. Toresume execution, the EPCregister must 

be altered so that the SYSCALL instruction is not re-executed; this is 

accomplished by adding a value of four to the EPC register before 

returning. If a SYSCALL instruction is in a branch delay slot a more 

complicated algorithm is required. 

Breakpoint Exception 

Cause. The Breakpoint exception occurs when an attempt is made to 
execute the BREAK instruction. This exception is not maskable. 
Handling. The common exception vector is used for this exception, 
and the BP code in the Cause register is set 
The EPC register points at the BREAK instruction unless it is in a 
branch delay slot, in which case the EPC points at the preceding 
branch instruction. 
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If the BREAK instruction is in a branch delay slot, the BD bit of the 
Status register is set, otherwise the bit is cleared. 
Servicing. When the Breakpoint exception occurs, control is 
transferred to the applicable system routine. Additional distinctions 
can be made from the unused bits of the BREAK instruction (bits 
25..6), and from loading the contents of the instruction at which the 
EPC register points. (A value of four must be added to the contents of 
the EPC register to locate the instruction if it resides in a branch delay 
slot) 

To resume execution, the EPC register must be altered so that the 
BREAK instruction is not re-executed; this is accomplished by adding 
the value of four to the EPC register before returning. If a BREAK 
instruction is in a branch delay slot, interpretation of the branch 
instruction is required to resume execution. 

Reserved Instruction Exception 

Cause. The Reserved Instruction exception occurs when an attempt is 

made to execute an instruction whose major opcode (bits 31.26) is 

undefined, or a SPECIAL instruction whose minor opcode (bits 5..0) is 

undefined. This exception also occurs on a REGMM instruction 

whose minor opcode (bits 20..16) is undefined. A Reserved Instruction 

exception can also occur if the processor attempts to execute 64-bit 

operations in 32-bit mode and operating is user or supervisor modes. 

64-bit operations are always valid in kernel mode regardless of the 

value of the KX bit in the Status register. This exception is not 

maskable. 

Handling. The common exception vector is used for this exception, 

and the RI code in the Cause register is set 

The EPC register points at the reserved instruction unless it is in a 

branch delay slot, in which case the EPC points at the preceding 

branch instruction. 

Servicing. In current systems, no instructions in the MIPS ISA are 

interpreted. The process executing at the time of this exception is 

handed a UNIX SIGILL/ILL_RESOP_FAULT (illegal instruction/ 

reserved operand fault) signal. This error is usually fatal. 
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Coprocessor Unusable Exception 

Cause. The Coprocessor Unusable exception occurs when an attempt 
is made to execute a coprocessor instruction for either 

• a corresponding coprocessor unit that has not been marked 
usable, or 

• CPO instructions, when the unit has not been marked 
usable and the process is executing in User mode. 

This exception is not maskable. 

Handling. The common exception vector is used for this exception, 

and the CPU code in the Cause register is set 

The contents of the Coprocessor Usage Error field of the coprocessor 

Control register indicate which coprocessor of the four was referenced. 

The EPC register points at the unusable coprocessor instruction unless 

it is in a branch delay slot, in which case the EPC points at the 

preceding branch instruction. 

Servicing. The coprocessor unit to which an attempted reference was 

made is identified by the Coprocessor Usage Error field. Results are 

one of the following: 

• If the process is entitled to access, the coprocessor is 
marked usable and the corresponding user state is restored 
to the coprocessor. 

• If the process is entitled to access the coprocessor, but the 
coprocessor does not exist or has failed, interpretation of 
the coprocessor instruction is possible. 

• If the BD bit is set in the Cause register, the branch 
instruction must be interpreted; then the coprocessor 
instruction can be emulated and execution resumed with 
the EPC register advanced past the coprocessor instruction. 

• If the process is not entitled to access the coprocessor, the 
process executing at the time is handed a UNIX SIGILL/ 
ILL_PRIVIN_FAULT (illegal instruction/privileged 
instruction fault) signal. This error is usually fatal. 
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Floating-Point Exception 

Cause. The Floating-Point exception is used by the R4000 floating- 
point coprocessor. The Floating-Point exception is not maskable. 
Handling. The common exception vector is used for this exception, 
and the FPE code in the Cause register is set 
The contents of the Floating-Point Control Status register indicate the 
cause of this exception. 

Servicing. This exception is cleared by clearing the appropriate bit in 
the Floating-Point Control Status register. For an unimplemented 
instruction exception, the kernel should emulate the instruction; for 
other exceptions, the kernel should pass the exception to the user 
program that caused the exception. 

Watch Exception 

Cause. The Watch exception occurs when a load or store instruction 

references the physical address specified in the WatchLo/ WatchHi 

system control coprocessor registers. The WatchLp register specifies 

whether a load or store initiated this exception. 

The CACHE instruction never causes a Watch exception. 

The Watch exception is postponed while the EXL bit is set in the Status 

register, and Watch is only maskable by setting EXL in the Status 

register. 

Handling. The common exception vector is used for this exception, 

and the Watch code in the Cause register is set 

Servicing. The Watch exception is a debugging aid; typically the 

exception handler transfers control to a debugger, allowing the user to 

examine the situation. To continue, the Watch exception must be 

disabled to execute the faulting instruction, and then the Watch 

exception must be reenabled. The faulting instruction can be executed 

either by interpretation or by setting breakpoints. 

Interrupt Exception 

Cause. The Interrupt exception occurs when one of the eight interrupt 
conditions is asserted. The significance of these interrupts is 
dependent upon the specific system implementation. 
Each of the eight interrupts can be masked by clearing the 
corresponding bit in the Int-Mask field of the Status register, and all of 
the eight interrupts can be masked at once by clearing the IEc bit of the 
Status register. 
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Handling. The common exception vector is used for this exception, 
and the Int code in the Cause register is set 
The IP field of the Cause register indicates the current interrupt 
requests. It is possible that more than one of the bits will be 
simultaneously set (or even no bits may be set) if an interrupt is 
asserted and then deasserted before this register is read. 
Servicing. If the interrupt is caused by one of the two software- 
generated exceptions (SW1 or SWO), the interrupt condition is cleared 
by setting the corresponding Cause register bit to 0. 
If the interrupt is hardware-generated, the interrupt condition is 
cleared by correcting the condition causing the interrupt pin to be 
asserted. 



5-40 



R4000 User's Manual-Preliminary 



Floating-Point Unit 



Functional Overview 



The MIPS Floating-Point Unit (FPU) operates as a coprocessor for the 
CPU and extends the CPU instruction set to perform arithmetic 
operations on values in floating-point representations. The FPU, with 
associated system software, fully conforms to the requirements of 
ANSI/IEEE Standard 754-1985, "IEEE Standard for Binary Floating- 
Point Arithmetic." In addition, the MIPS architecture fully supports 
the recommendations of the standard and precise exceptions. Figure 
6-1 illustrates the functional organization of the FPU. 
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Figure 6-1 FPU Functional Block Diagram 
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FPU Features 



• Full 64-bit operation. When the FR bit in the Status register 
equals zero, the FPU contains thirty-two 32-bit registers 
that hold single- or, when used in pairs, double-precision 
values. When the FR bit in the Status register equals one, 
the FPU registers are expanded to 64 bits wide. Each 
register can hold single- or double-precision values. The 
FPU also includes a 32-bit Status/Control register that 
provides access to all IEEE-Standard exception handling 
capabilities. 

• Load and Store Instruction Set Like the CPU, the FPU 
uses a load- and store-oriented instruction set, with single- 
cycle load and store operations. Floating-point operations 
are started in a single cycle and their execution is 
overlapped with other fixed-point or floating-point 
operations. 

• Tightly coupled Coprocessor Interface. The FPU resides 
on-chip to form a tightly coupled unit with a seamless 
integration of floating-point and fixed-point instruction 
sets. Since each unit receives and executes instructions in 
parallel, some floating-point instructions can execute at the 
same single-cycle per instruction rate as fixed-point- 
instructions. 

FPU Programming Model 

This section describes the organization of data in registers and the set 
of general registers available. This section also gives a summary 
description of the FPU registers. 

Floating-Point General Registers (FGRs) 

The FPU has a set of Floating-Point General-Purpose registers (FGRs) 
and two control registers: the Control/Status and Implementation/ 
Revision registers. The general registers can be accessed in three 
different ways: 

• As thirty-two general-purpose registers, each 32 bits wide 
(32 FGRs) when the FR bit in the Status register equals zero 
or 64-bits wide when FR equals one. The CPU accesses the 
general registers as FGRs through move, load, and store 
instructions. 
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When the FR bit in the Status register equals zero: as 
sixteen floating-point registers, each 64rbit-wide FPR holds 
floating-point values during floating-point operations. The 
FPRs hold values in either single- or double-precision 
floating-point format The FPU accesses the general 
registers as FPRs. Each FPR corresponds to adjacently 
numbered FGRs as shown in Figure 6-2. Only even 
numbers are used to address FPRs; odd FPR register 
numbers are invalid. During single-precision floating-point 
operations, only the even numbered (least, as shown in 
Figure 6-2) general registers are used, and during double- 
precision operations, the general registers are accessed in 
double pairs. 

When the FR bit in the Status register equals one: as thirty- 
two floating-point registers, each 64-bit-wide FPR holds 
floating-point values during floating-point operations. The 
FPRs hold values in either single- or double-precision 
floating-point format. The FPU accesses the general 
registers as FPRs. Each FPR corresponds to an FGR as 
shown in Figure 6-2. Both even and odd are valid to 
address FPRs. During single-precision floating-point 
operations, the low-order words of the general registers are 
used; during double-precision operations the general 
registers are accessed as 64rbit registers. 
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Figure 6-2 FPU Registers 
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Floating-Point Registers 

The FPU provides 16 Floating-Point registers (FPRs) when the FR bit in 
the Status register equals zero or 32 Floating-Point registers (FPRs) 
when the FR bit in the Status register equals one. These 64-bit registers 
hold floating-point values during floating-point operations and are 
physically formed from the General-Purpose registers (FGRs). When the 
FR bit in the Status register equals one, the FPR references a single 
64-bit FGR. 

The FPRs hold values in either single or double-precision floating- 
point format Only even numbers are used to address FPRs; odd FPR 
register numbers are invalid unless the FR bit is set to a one. When this 
bit is set, all FPR register numbers are valid. During single-precision 
floating-point operations when the FR bit is not set, only the even- 
numbered (least, as shown in Figure 6-2) general registers are used, 
and during double-precision floating-point operations the general 
registers are accessed in double pairs. Thus, in a double-precision 
operation selecting Floating-Point Register 0(FPR0) addresses adjacent 
Floating-Point General-Purpose registers FGRO and FGR1 . 

Floating-Point Control Registers 

The R4000 FPU has 32 control registers. The following Floating-Point 
Control registers (FCRs) can be accessed only by move operations. The 
registers are described below: 

• The Control/Status register (FCR31) controls and monitors 
exceptions, holds the result of compare operations, and 
establishes rounding modes. 

• The Implementation/Revision register (FCRO) holds revision 
information about the FPU. 

Table 6-1 lists the assignments of the FCRs. 
Table 6-1 Floating-Point Control Register Assignments 



FCR 
Number 



FCRO 

FCR1-30 

FCR31 



Use 



Coprocessor implementation and revision register 

Reserved 

Rounding mode, cause, trap enables, and flags 
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Control/Status Register FCR31 (Read and Write) 

The Control/ Status register, FCB31, contains control and status data 
and can be accessed by instructions in either Kernel or User mode. It 
controls the arithmetic rounding mode and the enabling of User-mode 
traps. It also identifies exceptions that occurred in the most recently 
executed instruction and any exceptions that may have occurred 
without being trapped. Figure 6-3 shows the bit assignments of 
FCB31. 
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When set denormalized results are flushed to zero instead of 
causing an unimplemented operation exception. 

Condition bit See description below. 

Cause bits. See Figure 6-4 and the description of Control/Status 

Register Cause, Flag, and Enable Bits. 

Enable bits. See Figure 6-4 and the description of Control/Status 

Register Cause, Flag, and Enable Bits. 

Flag bits. See Figure 6-4 and the description of Control/Status 

Register Cause, Flag, and Enable Bits. 

Rounding mode bits. See Table 6-2 and the section Control/Status 

Register Rounding Mode Control Bits. 



Figure 6-3 FP Control/Status Register Bit Assignments 
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Figure 6-4 Control/Status Register Qmse/Flag/Enable Bits 

When the Control/ Status register is read using a Move Control From 
Coprocessor 1 (CFC1) instruction, all unfinished instructions in the 
pipeline are completed before the contents of the register are moved 
to the main processor. If a floating-point exception occurs as the 
pipeline empties, the exception is taken and the CFC1 instruction can 
be re-executed after the exception is serviced. 
The bite in the Control/Status register can be set or cleared by writing 
to the register using a Move Control To Coprocessor 1 (CTC1) 
instruction. This register must only be written to when the FPU is not 
actively executing floating-point operations. This can be ensured by 
first reading the contents of the register to empty the pipeline. 
IEEE Standard 754. IEEE Standard 754 specifies that floating-point 
operations detect certain exceptional cases, raise flags, and optionally 
invoke an exception handler when an exception occurs. These features 
are implemented in the MIPS architecture with the Cause, Enable, and 
Flag fields of the ControlfStatus register. These flag bits implement 
IEEE 754 exception status flags, and the cause and enable bits 
implement exception handling. 

Control/Status Register FS Bit Bit 24 of the Control/ Status register is 
the FS bit When this bit is set, denormalized results are flushed to zero 
instead of causing an unimplemented operation exception. 
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Control/Status Register Condition Bit Bit 23 of the Control/Status 
register is the Condition bit When a floating-point Compare operation 
takes place, the result is stored at bit 23 in order that the state of the 
condition line can be saved or restored. The C bit is set tol if the 
condition is true; the bit is cleared to if the condition is false. Bit 23 is 
affected only by compare and Move Control To FPU instructions. 

Control/Status Register Cause, Flag, and Enable Bits 

Figure 6-4 illustrates the Cause, Flag, and Enable bit assignments in 

the Control/Status register. 

Bits 17..12 in the Control/Status register contain Cause bits, as shown in 

Figure 6-4, which reflect the results of the most recently executed 

instruction. The Cause bits are a logical extension of the CPO Cause 

register; they identify the exceptions raised by the last floating-point 

operation and raise an interrupt or exception if the corresponding 

enable bit is set 

The Cause bits are written by each floating-point operation (but not by 

load, store, or move operations). Unimplemented Operation (E) is set 

to 1 if software emulation is required, otherwise it remains 0. The 

other bits are set to or 1 to indicate the occurrence or non-occurrence 

(respectively) of an IEEE 754 exception. 

A floating-point exception is generated any time a Cause bit and the 

corresponding Enable bit are both set A floating-point operation that 

sets an enabled Cause bit forces an immediate exception, as does 

setting both Cause and Enable bits with CTCl. 

There is no enable for Unimplemented Operation (E). Setting 

Unimplemented Operation always generates a floating-point 

exception. 

When a floating-point exception is taken, no results are stored, and the 

only states affected are the Cause and Flag bits. Exceptions caused by 

an immediately previous floating-point operation can be determined 

by reading the Cause field. 

Before returning from a floating-point exception, or doing a CTCl, 

software must first clear the enabled Cause bits to prevent a repeat of 

the interrupt Thus, User-mode programs can never observe enabled 

Cause bits set; if this information is required in a User-mode handler, 

it must be passed somewhere other than the Status register. 

The appropriate Flag bits are set by the operation when a User-mode 

exception handler is invoked. This is not implemented in hardware; 

floating-point exception software is responsible for setting these bits 

before invoking a user handler. 
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For a floating-point operation that sets only unenabled Cause bits, no 
exception occurs and the default result defined by EEEE 754 is stored. 
In this case, the exceptions that were caused by the immediately 
previous floating-point operation can be determined by reading the 
Cause field. 

Figure 6-4 shows the meanings of each bit in the Cause field. If more 
than one exception occurs On a single instruction, each appropriate bit 
will be set 

The Flag bits are cumulative and indicate that an exception was raised 
on some operation since they were explicitly reset Flag bits are set to 
1 if an IEEE 754 exception is raised, and remain unchanged otherwise. 
The Flag bits are never cleared as a side effect of floating-point 
operations, but can be set or cleared by writing a new value into the 
Status register using a Move To Coprocessor Control instruction. 

Control/Status Register Rounding Mode Control Bits. 

Bits 1 and in the Control/Status register comprise the Rounding Mode 
(RM) field. These bits specify the rounding mode that the FPU uses for 
all floating-point operations as shown in Table 6-2. 

Table 6-2 Rounding Mode Bit Decoding 



Rounding 
Mode 



Mnemonic 



RN 



RZ 



Description 



Round result to nearest representable 
value; round to value with least-significant 
bit zero when the two nearest representable 
values are equally near. 



RP 



RM 



Round toward zero: round to value closest to 
and not greater in magnitude than the 
infinitely precise result. 



Round toward +»: round to value closest to 
and not less than the infinitely precise result. 



Round toward - «: round to value closest to 
and not greater than the infinitely precise 
result. 



Implementation and Revision Register FCRO (Read Only) 

The Implementation and Revision register specifies the implementation 
and revision number of the FPU. This information can be used to 
determine the coprocessor revision and performance level, and can 
also used by diagnostic software. 
Figure 6-5 shows the layout of the register. 
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31 



Implementation/Revision Register (FCRO) 
1615 87 



Imp 



Rev 



16 8 8 

Imp Implementation number (0x05) 

Rev Revision number in the form of y.x 
Reserved. Must be written as zeroes, returns zeroes when read. 



Figure 6-5 Implementation/Revision Register 

The revision number is a value of the form y x, where y is a major 
revision number held in bits 7„4, and x is a minor revision number 
held in bits 3..0. The revision number can distinguish some chip 
revisions; however, MIPS does not guarantee that changes to its chips 
are necessarily reflected by the revision number, or that changes to the 
revision number necessarily reflect real chip changes. For this reason 
revision number values are not listed, and software should not rely on 
the revision number to characterize the chip. 



Floating-Point Formats 



The FPU performs both 32-bit (single-precision) and 64-bit (double- 
precision) IEEE standard floating-point operations. The 32-bit single- 
precision format has a 24-bit signed-magnitude fraction field (f+s) and 
an 8-bit exponent (e), as shown in Figure 6-6. 



31 



30 



23 22 



S 

Sign 



e 
Exponent 



1 
Fraction 



1 



8 



23 



Figure 6-6 Single-Precision Floating-Point Format 

The 64-bit double-precision format has a 53-bit signed-magnitude 
fraction field (f+s) and an 11-bit exponent, as shown in Figure 6-7. 
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52 



Figure 6-7 Double-Precision Floating-Point Format 

Numbers in these floating-point formats are composed of three fields: 

• 1-bit sign: s 

• biased exponent e - E + bias 

• fraction: /= .&2^2"-^p-l 

The range of the unbiased exponent E includes every integer between 
two values E^ and E^ inclusive, and also two other reserved 
values: E^ -1 (to encode +0 and denormalized numbers), and 
E^ +1 (to encode +"° and NaNs [Not a Number!). For single- and 
double-precision formats, each representable nonzero numerical 
value his just one encoding. 

For single- and double-precision formats, the value of a number, v, is 
determined by the equations shown in Table 6-3. 
Table 6-3 Equations for Calculating Values m Single and 
Double-Precision Floating-Point Format 



(D 



(2) 



(3) 



(4) 



if E - E max +1 and f* 0, then vis NaN, regardless of s. 



if E - E max +1 and f - 0, then v « (-1) 8 ~. 



if E.nin < E < E max , then v - H) S 2 E (1 -0- 



(5) 



if E _ E _! and f * 0, then v- (-1) s 2 Emin (0./) 



-min 



HE = E min -1 and f = 0, then v- (-1) s 0- 



For all floating-point formats, if v is NaN, the most-significant bit of/ 
determines whether the value is a signaling or quiet NaN. v is a 
signaling NaN if the most-significant bit of /is set; otherwise v is a 
quiet NaN. Table 6-4 defines the values for the format parameters. 
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Table 6-4 Floating-Point Format Parameter Values 



Parameter 


Format 


Single 


Double 


f 


24 


53 


Rnax 


+127 


+1023 


^min 


-126 


-1022 


exponent bias 


+127 


+1023 


exponent width in bits 


8 


11 


integer bit 


hidden 


hidden 


fraction width in bits 


24 


53 


format width in bits 


32 


64 



Minimum and maximum floating-point values are given in Table 6-5. 
Table 6-5 Minimum and Maximum Floating-Point Values 



Float Minimum 


1.40129846e-45 


Float Minimum Norm. 


1.17549435e-38 


Float Maximum 


3.40282347e+38 


Double Minimum 


4.94065645841 24654e-324 


Double Minimum Norm 


2.225073858507201 4e-308 


Double Maximum 


1 .7976931 3486231 57e+308 
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Binary Fixed-Point Format 



Binary fixed-point values are held in 2's-complementary format 
Unsigned fixed-point values are not directly provided by the floating- 
point instruction set Binary fixed-point format is shown in Figure 6-8. 





31 30 











I 




S 
Sign 








i 
Integer 








1 


s 


sign bit 
integer value 


31 







Figure 6-8 Binary Fixed-Point Format 



Instruction Set Overview 



All FPU instructions are 32 bits long, aligned on a word boundary, and 
can be divided into the following groups: 

• Load, Store, and Move instructions move data between 
memory, the main processor, and the FPU General-Purpose 
registers. 

• Conversion instructions perform conversion operations 
between the various data formats. 

• Computational instructions perform arithmetic operations 
on floating-point values in the FPU registers. 

• Compare instructions perform comparisons of the contents 
of registers and set a condition bit based on the results. 

• Branch on FPU Condition instructions perform a branch to 
the specified target if the specified coprocessor condition is 
met. 

Table 6-6 lists the instruction set of the FPU. A complete description of 
each instruction is provided in Appendix B. 
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Table 6-6 FPU Instruction Summary 



OP 



LWC1 

SWC1 

LDC1 

SDC1 

MTC1 

MFC1 

CTC1 

CFC1 

DMTC1 

DMFC1 



CVT.S.fmt 

cvmfmt 

CVT.W.frnt 

ROUND.w.fmt 
TRUNC.w.fmt 
CEILw.fmt 
FLOOR.w.fmt 



ADD.fmt 

SUB.fmt 

MULfmt 

DIV.fmt 

ABS.fmt 

MOV.fmt 

NEG.fmt 

SQRT.fmt 



Ccond.fmt 



BC1T 
BC1F 
BC1TL 
BC1FL 



.fmt 
.cond 



Description 



Load/Store/Move instructions 

Load Word to FPU 
Store Word from FPU 
Load Doubleword to FPU 
Store Doubleword From FPU 
Move word To FPU 
Move word From FPU 
Move Control word To FPU 
Move Control word From FPU 
Doubleword Move To FPU 
Doubleword Move From FPU 

Conversion Instructions 

Floating-point Convert to Single FP 
Floating-point Convert to Double FP 
Floating-point Convert to Single 
Fixed Point 
Floating-point Round 
Floating-point Truncate 
Floating-point Ceiling 
Floating-point Floor 

Computational instructions 

Floating-point Add 
Floating-point Subtract 
Floating-point Multiply 
Floating-point Divide 
Floating-point Absolute value 
Floating-point Move 
Floating-point Negate 
Floating-point Square Root 

Compare instructions 

Floating-point Compare 

Branch on FP Condition 

Branch on FPU True 
Branch on FPU False 
Branch on FPU True Likely 
Branch on FPU False Likely 



format specifier 
condition specifier 
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Load, Store, and Move Instructions 

All movement of data between the FPU and memory is accomplished 
by: 

• Load Woid To Coprocessor 1 (LWC1) and Store Word To 
Coprocessor 1 (SWC1) instructions, which reference a 
single 32-bit word of the FPU general registers. 

• Load Doubleword (LDC1) and Store Doubleword (SDC1) 
instructions. 

These load and store operations are unformatted; no format 
conversions are performed and therefore no floating-point exceptions 
occur due to these operations. 

Data can also be moved directly between the FPU and the CPU by 
Move To Coprocessor 1 (MTC1), Move From Coprocessor 1 (MFC1), 
Doubleword Move To Coprocessor 1 (DMTC1), Doubleword Move 
From Coprocessor 1 (DMFCl)instructions. Like the floating-point 

load and store operations, these operations perform no format 

conversions and never cause floating-point exceptions. 

The instruction immediately following a load can use the contents of 

the loaded register. In such cases the hardware interlocks, requiring 

additional real cycles; therefore, scheduling load delay slots is 

desirable, although it is not required, for functional code. 

All coprocessor loads and stores reference the f ollowing aligned data 

items: 

• For word loads and stores, the access type is always 
WORD, and the low-order two bits of the address must 
always be zero. 

• For doubleword loads and stores, the access type is always 
DOUBLEWORD, and the low-order three bits of the 
address must always be zero. 

Regardless of byte-numbering order (endianness), the address 
specifies the byte that has the smallest byte address in the addressed 
field. For a Big-endian system, it is the leftmost byte; for a Little- 
endian system, it is the rightmost byte. 
Table 6-7 summarizes the load, store, and move instructions. 
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Table 6-7 FPU Load, Store and Move Instruction Summon/ 



Instruction 



Load Word 
toFPA 
(coprocessor 1 ) 



Store Word 
from FPA 
(coprocessor 1) 



Load Double- 
word to FPA 
(coprocessor 1) 



Store Double- 
word from FPA 
(coprocessor!) 



Instruction 



Format and Description 



op 



base 



ft 



offset 



LWC1 ft,offset(base) 

Sign-extend 16-bit offeef and add to contents of CPU register base to form 

address. Load contents of addressed word into FPU general register ft. 



SWC1 ft,offset(base) 

Sign-extend 16-bit offeef and add to contents of CPU register base to form 

address. Store the contents of FPU general register ft at addressed location. 



LDC1 ft,offset(base) 

Sign-extend 16-bit offset and add to contents of CPU register base to form 
address. Load contents of addressed doubleword into FPU general regis- 
ters ft and ft+1 (FR-O), or FPU general register ft (FR-1). 



SDC1 ft,offset(base) 

Sign-extend 16-bit offset and add to contents of CPU register base to form 
address. Store the 64-bit contents of FPU general registers ft and ft+1 
(FR-O), or FPU general register ft (FR-1) at addressed location. 



Move Word 
to FPA 
(coprocessor 1) 



Move Word 
from FPA 
(coprocessor 1) 



Move Control 
Word to FPA 
(coprocessor 1) 



Move Control 
Word from FPA 
(coprocessor!) 



Doubleword 
Move to FPA 
(coprocessor!) 



DoubleWord 
Move from FPA 
(coprocessor 1 ) 



Format and Description COP! sub 



fs 



MTC1 rt ,fs 

Move contents of CPU register rt into FPU general register fe. 



MFC1 n,fs 

Move contents of FPU general register fe into CPU register rt. 



CTC1 rt,fs . . . 

Move contents of CPU register rt into FPU control register fe. 



CFC1 rt,fs 

Move contents of FPU control register fe into CPU register rt 



DMTC1 rffe 

Move contents of CPU register rt into FPU general register fe. 



DMFC1 rt,fs 

Move contents of FPU general register fe into CPU register rt. 
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Floating-Point Conversion Instructions 



Conversion instructions perform conversions between the various 
data formats such as single- or double-precision fixed- or floating- 
point formats. Table 6-8 lists the conversion instructions and their 
formats. 

Table 6-8 FPU Conversion Instruction Summary 



Instruction 


Format and Description COP1 W fs fd function 


Floating-Point 
Convert to Single 
FP Format 


CVT.S.fmt fd,fs 

Interpret the contents of FPU register fs in the specified format (ftnr) 
and arithmetically convert to single binary floating-point format 
Place the rounded result in FPU register fd. 


Floating-Point 
Convert to Double 
FP Format 


CVT.D.fmt fd,fs 

Interpret the contents of FPU register fs in the specified format (//nr) 
and arithmetically convert to the double binary floating-point format. 
Place the rounded result in FPU register fd. 


Floating-Point 
Convert to Single 
Fixed-Point Format 


CVT.W.fmt fd,fs 

Interpret the contents of FPU register fe in the specified format (fmr) 
and arithmetically convert to the single fixed-point format. Place the 
result in FPU register fd. 


Floating-point 
Round 


ROUND.W.fmt fd,fs 

Interpret the contents of FPU register fe in the specified format (fmt) 
and arithmetically convert to the single fixed-point format. Place the 
result in FPU register fd. 


Floating-point 
Truncate 


TRUNC.W.fmt fd,fs 

Interpret the contents of FPU register fe in the specified format (fmf) 
and arithmetically convert to the single fixed-point format. Place the 
result in FPU register fd. 


Floating-point 
Ceiling 


CEILWJmt fd.fs 

Interpret the contents of FPU register fe in the specified format (fmt) 
and arithmetically convert to the single fixed-point format. Place the 
result in FPU register fd. 


Floating-point 
Floor 


FLOOR.W.fmt fd,fs 

Interpret the contents of FPU register fe in the specified format (frnf) 
and arithmetically convert to the single fixed-point format. Place the 
result in FPU register fd. 
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Floating-Point Computational Instructions 



Computational instructions perform arithmetic operations on 
floating-point values in registers. There are two categories of 
computational instructions, as shown in Table 6-8 and listed below: 

• 3-Operand Register-Type instructions, which perform 
floating-point addition, subtraction, multiplication, 
division, and square root operations. 

• 2-Operand Register-Type instructions, which perform 
floating-point absolute value, move, and negate operations. 

Table 6-9 FPU Computational Instruction Summary 



Instruction 


Format and Description COP1 fmt ft fs fd function 


Floating-Point 
Add 


ADD.fmt fd.fs.ft . 

Interpret the contents of FPU registers fe and ft in the specified 
format(/mf) and add arithmetically. Place the rounded result in FPU 
register fd. 


Floating-Point 
Subtract 


SUB.fmt fd,fs,ft 

Interpret the contents of FPU registers fe and ft in the specified 
format(f/nf) and arithmetically subtract. Place the rounded result in FPU 
register fd. 


Floating-Point 
Multiply 


MULfmt fd,fs,ft 

Interpret the contents of FPU registers fe and ft in the specified 
format(/mi) and arithmetically multiply. Place the rounded result in FPU 
register fd. 


Floating-Point 
Divide 


DIV.fmt fd,fs,ft 

Interpret the contents of FPU registers fe and ft in the specified format 
(fmt) and arithmetically divide fe by ft. Place the rounded result in FPU 
register fd. 


Floating-Point 
Absolute Value 


ABS.fmt fd,fs 

Interpret the contents of FPU register fe in the specified format (fmt) and 

take arithmetic absolute value. Place the result in FPU register fd. 


Floating-Point 
Move 


MOV.fmt fd,fs t m im A j 
Interpret the contents of FPU register fe in the specified format {fmt) and 
copy into FPU register fd. 


Floating-Point 
Negate 


NEG.fmt fd.fs 

Interpret the contents of FPU register fe in the specified format (fmt) and 

take arithmetic negation. Place the result in FPU register fd. 


Floating-Point 
Square root 


SQRT.fmt fd.fs 

Interpret the contents of FPU register fein the specified format (fmt) and 
take the positive arithmetic square root. Result is rounded then placed 
in the FPU register fd. 



In the instruction formats shown in Table 6-8 and Table 6-9 the fmt 
term appended to the instruction opcode is the data format specifier: 
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s specifies single-precision binary floating-point, d specifies double 
precision binary floating-point, and w specifies binary fixed-point 
For example, an ADD.D specifies that the operands for the addition 
operation are double-precision binary floating-point values. 
When fmt is single-precision or binary fixed-point, the odd register of 
the destination is undefined. 



Floating-Point Compare Operations 

The floating-point Compare (Cfmtcond) instructions interpret the 
contents of two FPU registers ifs,ft) in the specified format (fmt) and 
arithmetically compare them. A result is determined based on the 
comparison and conditions (cond) specified in the instruction. Table 6- 
8 summarizes the floating-point Compare instructions and Table 6-11 
lists the conditions that can be specified for the compare operation. 
Table 6-10 FPU Compare Instruction Summary . 



Instruction 



Floating-Point 
Compare 



Format and Description 



COP1 



fmt 



fs 



function 



CcondJmt fe,ft •,_..«.,„. «• a ' 

Interpret the contents of FPU registers fe and ft in the specified 
format (fmt) and compare arithmetically. The result is determined by the 
comparison and the specified condition (cond). After a 1 -instruction 
delayTthe condition is available for testing by the CPU with the Branch 
on Floating-Point Coprocessor Condition (BC1T, BC1F) instructions. 
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Table 6-11 Relational Mnemonic Definitions 


Mnemonic 


Definition 


Mnemonic 


Definition 


F 


False 


T 


True 


UN 


Unordered 


OR 


Ordered 


EQ 


Equal 


NEQ 


Not Equal 


UEQ 


Unordered or Equal 


OLG 


Ordered or Less than or Greater than 


OLT 


Ordered Less Than 


UGE 


Unordered or Greater than or Equal 


ULT 


Unordered or Less Than 


OGE 


Ordered Greater Than 


OLE 


Ordered Less than or Equal 


UGT 


Unordered or Greater Than 


ULE 


Unordered or Less than or Equal 


OGT 


Ordered Greater Than 


SF 


Signaling False 


ST 


Signaling True 


NGLE 


Not Greater than or 
Less than or Equal 


GLE 


Greater than, or Less than or Equal 


SEQ 


Signaling Equal 


SNE 


Signaling Not Equal 


NGL 


Not Greater than or Less than 


GL 


Greater Than or Less Than 


LT 


Less Than 


NLT 


Not Less Than 


NGE 


Not Greater than or Equal 


GE 


Greater Than or Equal 


LE 


Less than or Equal 


NLE 


Not Less Than or Equal 


NGT 


Not Greater Than 


GT 


Greater Than 
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Branch on FPU Condition Instructions 

Table 6-12 summarizes the four Branch on FPU (coprocessor unit 1) 
Condition instructions that can be used to test the result of the FPU 
Compare (Ccond) instructions. In this table, delay sbt refers to the 
instruction immediately following the branch instruction. Refer to 
Chapter 2 for a discussion of the branch delay slot 

Table 6-12 Branch on FPU Condition Instructions ^^ 



Instruction 



Branch on 
FPU True 



Branch on 
FPU False 



Branch on 
FPU True 
Likely 



Branch on 
FPU False 
Likely 



Format and Description 



COP1 



BC 



br 



offset 



BC1T offset ± '^ . x ,. . 

Compute a branch target address by adding the address of the instruction in 
the delay slot and the 16-bit offset (shifted left two bits and sign extended). 
Branch to the target address (with a delay of one instruction) if the FPU 
condition line is true. 



BC1F offset 

Compute a branch target address by adding the address of the instruction in 
the delay slot and the 1 6-bit offset (shifted left two bits and sign extended). 
Branch to the target address (with a delay of one instruction) if the FPU 
condition line is false. 



BC1TL offset ,,._,,.., 
Compute a branch target address by adding the address of the instruction in 
the delay slot and the 16-blt offset (shifted left two bits and sign extended). 
Branch to the target address (with a delay of one instruction) If the FPU 
condition line is true. If conditional branch is not taken, theinstruction in the 
branch delay slot is nullified. 



BC1FL offset A ^ . 4 .. . 

Compute a branch target address by adding the address of the instruction in 
the delay slot and the 1 6-bit offset (shifted left two bits and sign extended). 
Branch to the target address (with a delay of one instruction) if the FPU 
condition line is false. If conditional branch is not taken, the instruction in the 
branch delay slot is nullified. 
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FPU Instruction Pipeline 



The FPU provides an instruction pipeline that parallels the CPU 
instruction pipeline. It shares the same 8-stage pipeline architecture 
with the CPU, as described in Chapter 2, Instruction Set Summary. 



Instruction Execution 



Figure 6-9 illustrates how the eight instructions overlap in the FPU 
pipeline. 
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Figure 6-9 FPU Instruction Pipeline 

Figure 6-9 assumes that one instruction is completed every pcycle. 
Most FPU instructions, however, require more than one cycle in the 
EX stage. Therefore, the FPU must stall the pipeline if an instruction 
execution cannot proceed because of register or resource conflicts. 
Figure 6-10 illustrates the effect of a three-cycle stall on the FPU 
pipeline. 
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Figure 6-10 FPU Pipeline Stall 

To mitigate the performance impact that would result from stalling 
the instruction pipeline, the FPU allows instructions to overlap so that 
instruction execution can proceed as long as there are no resource 
conflicts, data dependencies, or exception conditions. The f ollowing 
sections describe the timing and overlapping of FPU instructions. 

Instruction Execution Times 

Unlike the CPU, which executes almost all instructions in a single 
cycle, the time required to execute FPU instructions operates within a 
larger range. 

Table 6-13 gives the minimum latency, in processor pipeline cycles, of 
each floating-point operation for the currently implemented 
configurations. These latency calculations assume the result of the 
operation is immediately used in a succeeding operation. 
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Table 6-13 Floating-Point Operation Latencies 



(a) 



(b) 
(c) 





Pipeline cycles 


Operation 


Single 


Double 


Word 


ADD.fmt 


4 


4 


(b) 


SUB.fmt 


4 


4 


(b) 


MULfmt 


7 


8 


(b) 


DIV.fmt 


23 


36 


(b) 


SQRT.tmt 


54 


112 


(b) 


ABS.fmt 


2 


2 


(b) 


MOV.fmt 


1 


1 


(b) 


NEG.fmt 


2 


2 


(b) 


ROUND.W.fmt 


4 


4 


(b) 


TRUNC.W.fmt 


4 


4 


(b) 


CEILW.fmt 


4 


4 


(b) 


FLOOR.W.fmt 


4 


4 


(b) 


CVT.S.fmt 


(b) 


4 


6 


CVT.D.fmt 


2 


(b) 


5 


CVT.W.fmt 


4 


4 


(b) 


C.fmtcond 


3(a) 


3(a) 


(b) 


BC1T 


(c) 


1 


(c) 


BC1F 


(c) 


1 


(c) 


BC1TL 


(c) 


1 


(c) 


BC1FL 


(c) 


1 


(c) 


LWC1 


(c) 


3 


(c) 


SWC1 


(c) 


1 


(c) 


LDC1 


(c) 


3 


(c) 


SDC1 


(c) 


1 


(c) 


MTC1 


(c) 


3(a) 


(c) 


MFC1 


(c) 


3 


(c) 


CTC1 


(c) 


3(a) 


(c) 


CFC1 


(c) 


2 


(c) 



Software must schedule operations so that an FPU register that is 
the target of a floating-point load or move is not read until at least 
two instructions later. Software must also schedule a floating- 
point branch instruction two or more instructions after a floating- 
point compare instruction. 
These operations are illegal. 
These operations are undefined. 
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Scheduling FPU Instructions 

The floating-point architecture pennits the pipelining of operations 

and the overlapping of floating-point operations with floating-point 

load, store, and move instructions and with other processor 

operations. 

The FPU coprocessor implements three separate operation (op) units: 

multiply, divide, and an adder for remaining operations. 

Multiplies and divides can overlap with adder operations; however, 

they use the adder on their final cycles, which imposes some 

limitations. 

The multiply unit can begin a new double-precision multiply every 

four cycles, and a new single-precision multiply every three cycles. 

The adder generally begins a new operation one cycle before the 

previous cycle completes; therefore, a floating-point add or subtract 

can start every three cycles. 

The FPU coprocessor pipeline is fully bypassed and interlocked. 



FPU Pipeline Overlapping 

The FPU has three operational (op) units: adder, divider, and 

multiplier. Each op unit is controlled by an FPU resource scheduler, 

which issues instructions under certain constraints, as described in the 

following section. 

Table 6-14 lists the pipe stages used in the op units (although not all 

stages are used by each unit). 

Table 6-U FPU Operational Unit Pipe Stages 



Stage 


Description 


A 


FPU Adder Mantissa Add stage 


E 


FPU Adder Exception lest stage 


EX 


CPU EX stage 


M 


FPU Multiplier 1st stage 


N 


FPU Multiplier 2nd stage 


R 


FPU Adder Result Round stage 


S 


FPU Adder Operand Shift stage 


U 


FPU Unpack stage 
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Instruction Scheduling Constraints 

The FPU resource scheduler is kept from issuing instructions to FPU 
operation units (adder, multiplier, and divider) by the limitations in 
their miao-architectures listed below. If any of the following 
constraints are violated, the operation unit assumes the outstanding 
instruction in its pipe is discarded, and then continues operation on 
the most recently issued instruction. 

FPU Divider Constraints 

Handles only one non-overlapped divide instruction in its pipe at any 
one time. 

FPU Multiplier Constraints 

Allows up to two pipelined MUL.[S,D] instructions to be processed as 
long as the following constraints are met 

• Two idle cycles are required after MULS (shown in Figure 
6-11) 

• Three idle cycles are required after MULX> (shown in 
Figure 6-12). 

These figures are not meant to imply that back-to-back multiplies are 
allowed. Rather, as shown in Figure 6-11, 12 and 13 are illegal and 15, 
16, 17, and 18 are successive stages of 14, referenced to II. Figure 6-12 is 
similar, in that 16, 17, and 18 are successive stages of 15. 
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Figure 6-11 MUL.S Instruction Scheduling™ R4000 FPU Multiplier 
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Figure 6-12 MULD Instruction Scheduling in R4000 FPU Multiplier 

FPU Adder Constraints 

The following constraints must be met in the FPU adder op unit 
• The adder op unit allows one clock cycle overlap between 
each newly-issued instruction and the instruction being 
completed (as shown in Figure 6-13). 
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Figure 6-13 Instruction Cycle Overlap in R4000 FPU Adder 
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The adder allows the cleanup stages (A, R) of a multiply 
instruction to be pipelined with the execution of 
ADD.[S,D], SUB.[S,D] or CCOND.[S,D], as long as no two 
instructions attempt to simultaneously use the same A and 
R pipe stages. For instance, Figure 6-14 shows a resource 
conflict between the mantissa add (A, stage 7) of 
instructions 1, 5, and 6. This figure also shows the resource 
conflict between result round (R) stage 8 of instructions 1, 
5, and 6. The multiply cleanup cycles (A, R) can neither 
overlap nor pipeline with any other instruction currently 
in the adder's pipe. These constraints are shown in Figure 
6-15, Figure 6-16, and Figure 6-17. 
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Figure 6-14 MULD and ADD.IS.D] Cycle Conflict in R4000 FPU Adder 
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Figure 6-15 MULS and ADD.[S,D1 Cycle Conflict in R4000 FPU Adder 
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Figure 6-16 MULD and CMPJS,DJ Cleanup Cycle Conflict in R4000 FPU Adder 
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Figure 6-17 MULS and CMP. IS, D] Cleanup Cycle Conflict in R4000 FPU Adder 
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The adder does not allow the preparation (U stage) and 
cleanup cycles (N, A, R) of a divide instruction to be 
pipelined with any other instruction; however, the adder 
does allow the last cycle of preparation or cleanup to be 
overlapped one clock by the following instructions's U 
stage (the CPU EX cycle) as shown in Figure 6-18. 
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Figure 6-18 Adder Prep and Cleanup Cycle Overlap 

Instruction Latency, Repeat Rate and Pipeline Stage Sequences 

Table 6-15 shows the latency and repeat rate between instructions, 
together with the sequence of pipeline stages for each instruction. For 
instance, the latency of the ADD.[S,D] is 4, which means it takes four 
processor cycles to complete. The Repeat Rate column indicates how 
soon an instruction can be repeated; for instance, an ADD.[S,D] can be 
repeated after the conclusion of the third pipeline stage. 
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Table 6-15 Latency, Repeat Bate, and Pipe Stages of R4000 FPU Instructions 



Instruction Type 



MOV.[S,D] 



ADD.[S,D] 
SUB.[S,D] 



C.COND.[S,D] 

NEG.[S,D] 

ABS.IS.D] 



Latency 



CVT.S.W 

CVT.D.W 

CVT.S.L 

CVT.D.L 

CVT.D.S 

CVT.S.D 



CVT.W.[S,D] or 
ROUND.W.[S,D] or 
TRUNC.W.[S,D] or 
CEILW.[S,D] or 
FLOOR.W.[S,D] 



MULS 
MULD 
DIV.S 

DIV.D 

SQRT.S 
SQRT.D 



4 

4 

~3~ 



2 
2 



6 
5 
7 
4 
2 
4 



Repeat 
Rate 



7 
8 
23 

36 

2-54 
2-112 



1 



3 
3 



_2_ 

1 
1 



5 
4 
6 
3 
1 
3 



Pipeline Stage 
Sequence 



EX 



3 
4 
22 

35 

2-53 
2-111 



U-» S+A-» A+R-» R+S 
U-> S+A-* A+R-» R+S 



U-*A-»R 



U-*S 
U-»S 



U-» A-» R-» S-4 A-» R 

U-» S-» A-» R-» S 

U-» A-» R-» S-> S-» A-» R 

U-» A-» R-» S 

U-»S 

U-» S-» A-» R 



U-» S-» A-» R 



U-» E/M-» M-» M-» N-» N/A-» R 
U-» E/M-» M-» M-> M-» N-» N/A-» R 
U-» S+A-» S+R-4 S-» D...D-* D/A-» D/R-» 
D/A-* D/R->A-»R 

U_» A-» R-» D...D-* D/A-> D/R-» D/A -»D/R-» 
A-»R 

U_> E-» A+R-» -» A+R-» A-» R 

U-» E-» A+R-» -» A+R-» A-» R 
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Resource Scheduling Rules 

The FPU Resource Scheduler issues instructions while adhering to the 
rules described below. These scheduling rules optimize op unit 
executions; if the following rules are not followed the hardware 
interlocks to guarantee correct operation, 

DIV.IS,D] can start only when all of the following conditions are met 
in the RF stage 

• The divider is idle. 

• The adder is either idle, or in its second-to-last execution 
cycle. 

• The multiplier is either idle, or in its first execution cycle. 
Idle means an operation unit, adder, multiplier, or divider, is either not 
processing any instruction, or is currently at its last execution cycle 
completing an instruction. 

MUL.[S,D] can start only when all of the following conditions are met 
in the RF stage: 

• The multiplier is either idle, or it is: 

- within the third execution cycle (EX+2) if the most 
recent instruction in multiplier's pipe is MUL.S, or 

- within the fourth execution cycle (EX+3) if the most 
recent instruction in multiplier's pipe is MUL.D. 

• The adder is either idle, or it must not be: 

- processing the first execution cycle (EX) of a conversion 
from long integer to short floating-point, CVT.S.L, 

- within the first three preparation cycles (EX..EX+2) of a 
DIV.S, 

- in the second preparation cycle (EX+1) of a DIV.D, or 

- processing a square root instruction. 

• The divider is either idle, or it must not be: 

- executing within the last fifteen cycles of a DIV.[S,D], 

- in the second execution cycle (EX+1) of a DIV.D, or 

- in the first three execution cycles (EX..EX+2) of a DIV.S. 
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SQRT.[S,D] can start when both of the following conditions are met 
in the RF stage 

• The adder is either idle, or it is in its second-to-last 
execution cycle. 

• The multiplier and divider must be idle. 
CVT.fmt instructions can only start when all of the following 
conditions are met in the RF stage: 

• The adder is either idle, or it is in its second-to-last 
execution cycle. 

• The multiplier is either idle, or in one of the states 
described below: 

- If the instruction is an CVTS.L, CVT.S.W or CVT.D.W, 
the multiplier must be idle. 

- If the instruction is an CVT.D.L, CVT.S.D, CVT.W.[S,D], 
CEIL.W.[S,D], FLOOR.W.[S,D], ROUND.W.[S,D], or 
TRUNC.W.tS,D], the multiplier must not be executing 
beyond the first cycle (EX) of a MUL.S or the second 
cycle (EX+1) of a MUL.D. If two multiply instructions 
have already been initiated in the multiplier, none of 
these convert instructions are allowed to start. 

- If the instruction is an CVT.D.S, the multiplier must not 
be executing the second-to-last execution cycle of either 
the first or second MUL.[S,D] in the multiplier pipe. 

• The divider is idle, or not executing the first three 
(EX..EX+2) nor the last fifteen cycles of a DIV.[S,D]. 

ADD.[S,D] or SUB.[S,D] can start only when all of the following 
conditions are met in the RF stage: 

• The adder is either idle, or it is in its second-to-last 
execution cycle. 

• The multiplier is either idle, or, among two possible 
MUL.[S,D] instructions, it is not executing within either the 
fourth or fifth execution cycle from the last. 

• The divider is either idle, or it is not executing within the 
first three (EX..EX+2) nor the last fifteen cycles of a 
DIV.[S,D]. 
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NEG.[S,D] or ABS.[S,D] can start only when all of the following 
conditions are met in the RF stage: 

• The adder is either idle, or it is in its second-to-last 
execution cycle. 

• The multiplier is either idle, or it is not executing the 
second-to-last execution cycle. 

• The divider is either idle, or it is not executing the first 
three (EX..EX+2) nor the last fifteen cycles of a DIV.[S,D]. 

GCOND.[S,D] can start only when all of the following conditions are 
met in the RF stage: 

• The adder is either idle, or it is in its second-to-last 
execution cycle. 

• The multiplier is either idle, or it is not executing the 
fourth cycle from the last 

• The divider is either idle, or it is not executing the first 
three (EX..EX+2) nor the last fifteen cycles of a DIV.[S,D]. 
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This chapter describes how the FPU handles floating-point 
exceptions. A floating-point exception occurs whenever the FPU 
cannot handle the operands or results of a floating-point operation in 
the normal way. The FPU responds either by generating an exception 
to initiate a software trap or by setting a status flag. 
The FP Control/Status register described in Chapter 6, Floating-Point 
Unit, contains an enable bit for each exception type; these exception 
enable bits determine whether an exception will cause the FPU to 
initiate a trap or set a status flag. If a trap is taken, the FPU remains in 
the state found at the beginning of the operation and a software 
exception handling routine is executed. If no trap is taken, an 
appropriate value is written into the FPU destination register and 
execution continues. 
The FPU supports the five IEEE Standard 754 exceptions: 

• Inexact (I) 

• Overflow (O) 

• Underflow (U) 

• Divide by Zero (Z) 

• Invalid Operation (V) 

with Cause bits, Enables, and Flag bits (status flags). 
The FPU adds a sixth exception type, unimplemented operation (E), to 
be used when the FPU cannot implement the standard MIPS floating- 
point architecture, including cases where the FPU cannot determine 
the correct exception behavior. This exception indicates that a 
software implementation must be used. The unimplemented 
operation exception has no Enable or Flag bit; whenever this exception 
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occurs, an urrimplemented exception trap is taken (if the FPU 

interrupt input to the CPU is enabled). 

Figure 7-1 illustrates the Control/Status register bits used to support 

exceptions. 
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Figure 7-1 Control/Status Register Exceptum/Flag/Trap/Enable Bits 

Each of the five IEEE standard exceptions (V, Z, O, U, I) is associated 
with a trap under user control, which is enabled by setting one of the 
five Enable bits. When an exception occurs, both the corresponding 
Cause and Flag bits are set If the corresponding Enable bit is set, the 
FPU generates an interrupt to the CPU and the subsequent exception 
processing allows a trap to be taken. 



Exception Trap Processing 



When a floating-point exception trap is taken, the Cause register 
indicates that the floating-point coprocessor is the cause of the 
exception trap. The FPE code is used, and the Cause bits of the 
floating-point Control/ Status register indicate the reason for the 
floating-point exception. These bits are, in effect, an extension of the 
system coprocessor Cause register. 
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Flags 



For each IEEE exception, a Flag bit is provided. This Flag bit is set on 
any occurrence of its corresponding exception condition, with no 
corresponding exception trap signaled. The Flag bit is reset by writing 
a new value into the Status register; flags can be saved and restored 
individually, or as a group, by software. 

When no exception trap is signaled, a default action is taken by the 
floating-point coprocessor, which provides a substitute value for the 
exception-causing result of the floating-point operation. The 
particular default action taken depends upon the type of exception, 
and in the case of the Overflow exception, the current rounding mode. 
Table 7-1 lists the default action taken by the FPU for each of the IEEE 
exceptions. 
Table 7-1 Default FPU Exception Actions 



Field 


Description 


Rounding mode 


Default action 


V 


invalid operation 


ANY 


Supply a quiet Not a Number (NaN) 


Z 


Division by zero 


ANY 


Supply a properly signed « 





Overflow exception 


RN 


Modify overflow values to «s with the 
sign of the intermediate result 


RZ 


Modify overflow values to the format's 
largest finite number with the sign of the 
intermediate result 


RP 


Modify negative overflows to the 
format's most negative finite number, 
modify positive overflows to + <» 


RM 


Modify positive overflows to the 
format's largest finite number; modify 
negative overflows to - <» 


u 


Underflow exception 


ANY 


Supply a rounded result 


1 


Inexact exception 


ANY 


Supply a rounded result 



The FPU detects internally the eight conditions that can cause 
exceptions. When the FPU encounters one of these unusual situations, 
it causes either an IEEE exception or an Unimplemented Operation 
exception (E). Table 7-2 lists the exception-causing situations and 
contrasts the behavior of the FPU with the requirements of the IEEE 
standard. 
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Table 7-2 FPU Exception-Causing Conditions 



FPA Internal result 



Inexact result 
Exponent overflow 
Divide-by-zero 
Overflow on convert 
Signaling NaN source 
Invalid operation 
Exponent underflow 
Denormalized source 



IEEE 
Stndrd 



I 
0,1* 

z 

V 
V 
V 

u 

none 



Trap 
Enab. 



I 
0,1 

z 

V 
V 
V 

E 
E 



Trap 
Disab. 



I 

O.l 

z 

E 
E 
E 
E 
E 



Note 



Loss of Accuracy 

Normalized exponent > Emax 

Zero is (exponent - Emin-1 , mantissa - 0) 

Source out of integer range 

Quiet NaN source produces quiet NaN result 

0/0, etc. 

Normalized exponent < Emin 

Exponent - E-1 and mantissa o 



* Standard specifies inexact exception on overflow only if overflow trap is disabled. 



The following sections describe the conditions that cause the FPU to 
generate each of its exceptions and details the FPU response to each 
exception-causing situation. 



Inexact Exception (I) 



The FPU generates the Inexact exception if the rounded result of an 
operation is not exact or if it overflows. 

NOTE: The FPU usually examines the operands of floating-point op- 
erations before execution actually begins to determine (based on the 
exponent values of the operands) if the operation can possibly cause an 
exception. If there is a possibility of an instruction causing an excep- 
tion trap, then the FPU uses a coprocessor stall mechanism to execute 
the instruction. Itis impossible, however, for the FPU to predetermine 
if an instruction will produce an inexact result Therefore, if Inexact 
exception traps are enabled, the FPU uses the coprocessor stall mech- 
anism to execute all floating-point operations that require more than 
one cycle. Since this mode of execution can impact performance, Inex- 
act exception traps should be enabled only when necessary. 
Trap Enabled Results: If Inexact exception traps are enabled, the 
result register is not modified and the source registers are preserved. 
Trap Disabled Results: The rounded or overflowed resultis delivered 
to the destination register if no other software trap occurs. 
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Invalid Operation Exception (V) 

The Invalid Operation exception is signaled if one or both of the 
operands are invalid for an implemented operation When the 
exception occurs without a trap, the MIPS ISA defines the result as a 
quiet Not a Number (NaN). The invalid operations are: 

• Addition or subtraction: magnitude subtraction of 
infinities, such as: 

( + oo ) + (-«) or (-00 )-(-«) 

• Multiplication: times ~, with any signs. 

• Division: 0/0, or «/», with any signs. 

• Conversion of a floating-point number to a fixed-point 
format when an overflow, or operand value of infinity or 
NaN, precludes a faithful representation in that format 

• Comparison of predicates involving < or > without ?, when 
the operands are unordered. 

• Any arithmetic operation on a signaling NaN. A move 
(MOV) operation is not considered to be an arithmetic 
operation, but absolute value (ABS) and negate (NEG) are 
considered to be arithmetic operations and will cause this 
exception if one or both operands is a signaling NaN. 

• Square root Vx, where x is less than zero. 

Software can simulate the Invalid Operation exception for other 
operations that are invalid for the given source operands. Examples of 
these operations include IEEE 754-specified functions implemented in 
software, such as Remainder: x REM y, where y is zero or x is infinite; 
conversion of a floating-point number to a decimal format whose 
value causes an overflow, or is infinity or NaN; and transcendental 
functions, such as In (-5) or cos-l(3). Refer to Appendix B for 
examples or for routines to handle these cases. 
Trap Enabled Results: The original operand values are undisturbed. 
Trap Disabled Results: The FPU always signals an Unimplemented 
exception because it does not create the NaN that the IEEE standard 
specifies should be returned under these circumstances. 
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Division-by-Zero Exception (Z) 

The Division-by-Zero exception is signaled on an implemented divide 
.operation if the divisor is zero and the dividend is a finite non-zero 
number. Software can simulate this exception for other operations that 
produce a signed infinity, such as ln(0), secfo/2), csc(0), or (T • 
Trap Enabled Results: The result register is not modified, and the 
source registers are preserved. 

Trap Disabled Results: The result, when no trap occurs, is a correctly 
signed infinity. 

Overflow Exception (O) 

The Overflow exception is signaled when the magnitude of the 
rounded floating-point result, if the exponent range were to be 
unbounded, is larger than the destination format's largest finite 
number. (This exception also sets the Inexact exception and Flag bits.) 
Trap Enabled Results: The result register is not modified, and the 
source registers are preserved. 

Trap Disabled Results: The result, when no trap occurs, is 
determined by the rounding mode and the sign of the intermediate 
result (as listed in Table 7-1). 

Underflow Exception (U) 

Two related events contribute to the Underflow exception: 

• The creation of a tiny non-zero result between ±2Emin 
which can cause some later exception because it is so tiny. 

• The extraordinary loss of accuracy during the 
approximation of such tiny numbers by denormalized 
numbers. 

IEEE Standard 754 permits a choice in the manner in which these 

events are detected but requires they be detected the same way for all 

operations. 

The IEEE standard specifies that tininess may be detected either: 

• after rounding (when a nonzero result, computed as 
though the exponent range were unbounded, would lie 
strictly between ±2Emin), or 

• before rounding (when a nonzero result, computed as 
though the exponent range and the precision were 
unbounded, would lie strictly between ±2Emin). 
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The MIPS architecture requires that tininess be detected after 

rounding. 

Loss of accuracy can be detected as either: 

• denormalization loss (when the delivered result differs 
from what would have been computed if the exponent 
range were unbounded), or 

• inexact result (when the delivered result differs from what 
would have been computed if the exponent range and 
precision were both unbounded). 

The MIPS architecture requires that loss of accuracy be detected as 

inexact result 

Trap Enabled Results: When an underflow trap is enabled, 

underflow is signaled when tininess is detected regardless of loss of 

accuracy. If underflow traps are enabled, the result register is not 

modified, and the source registers are preserved. 

Trap Disabled Results: When an underflow trap is not enabled, 

underflow is signaled (using the underflow flag) only when both 

tininess and loss of accuracy have been detected. The delivered result 

might be zero, denormalized, or ±2Emin 

Unimplemented Instruction Exception (E) 

Any attempt to execute an instruction with an operation code or 
format code that has been reserved for future definition sets the 
Unimplemented cause bit and traps. The operand and destination 
registers remain undisturbed and the instruction is emulated in 
software. Any of the IEEE 754 exceptions can arise from the emulated 
operation, and these exceptions in turn are simulated. 

The Unimplemented Instruction exception can also be signaled when 
unusual operands or result conditions are detected that the 
implemented hardware cannot properly handle. These include: 

• Denormalized operand 

• Not a Number operand 

• Denormalized result 

• Underflow • 

• Reserved opcodes 

• Unimplemented formats 

• Operations which are invalid for their format (for instance, 
CVTS.S) 
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NOTE: Denormalized and NaN operands are only trapped if the in- 
struction is a convert or computational operation. Moves do not trap 
if their operands are either denormalized or NaNs. 
The use of this exception for such conditions is optional; most of these 
conditions are newly developed and are not expected to be widely 
used in early implementations. Loopholes are provided in the 
architecture so that these conditions can be implemented with 
assistance provided by software, maintaining full compatibility with 
the IEEE standard. 

Trap Enabled Results: The original operand values are undisturbed. 
Trap Disabled Results: This trap cannot not be disabled. 

Saving and Restoring State 

Sixteen doubleword coprocessor load or store operations save or 
restore the coprocessor floating-point register state in memory. The 
remaining control and status information can be saved or restored 
through Move To/From Coprocessor Control Register instructions, 
and saving and restoring the processor registers. Normally, the 
Control/ Status register is saved first and restored last 
When coprocessor Control/Status register (FCR31) is read, and the 
coprocessor is executing one or more floating-point instruction, the 
instructions) in progress are either completed or reported as 
exceptions. The architecture requires that no more than one of these 
pending instructions can cause an exception. If one of the pending 
instructions cannot be completed, the instruction is placed in the 
Exception register, if present, and information indicating the type of 
exception is placed in the Control/Status register. State information in 
the status word indicates that exceptions are pending when state is 
restored. 

Writing a zero value to the Cause field of Control register 31 clears all 
pending exceptions, permitting normal processing to be restarted 
after the floating-point register state is restored. 
The Cause field of the ControljStatus register holds the results of only 
one instruction; the FPU examines source operands before an 
operation is initiated to determine if the instruction can possibly cause 
an exception. If an exception is possible, the FPU executes the 
instruction in stall mode to ensure that no more than one instruction 
(that might cause an exception) is executed at a time. 
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Trap Handlers for IEEE Standard Exceptions 

The tpitf. Standard 754 strongly recommends that users be allowed to 
specify a trap handler for any of the five standard exceptions which 
can compute or specify a substitute result be placed in the destination 
register of the operation. 

By retrieving an instruction using the processor EPC register, the trap 
handler determines: 

• The exceptions occurring during the Operation 

• The operation being performed 

• The destination format 

On Overflow or Underflow exceptions (except for conversions), and 
on Inexact exceptions, the trap handler gains access to the correctly 
rounded result by examining the source registers and simulating the 
operation in software. 

On Overflow or Underflow exceptions encountered on floating-point 
conversions, and on Invalid Operation and Divide-by-Zero 
exceptions, the trap handler gains access to the operand values by 
examining the source registers of the instruction. 
The IEEE Standard 754 recommends that, if enabled, the overflow and 
underflow traps take precedence over a separate inexact trap. This 
prioritization is accomplished in software; hardware sets the bits for 
both the Inexact exception and the Overflow or Underflow exception. 
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This chapter describes the signals used by and in conjunction with the 
R4000. The signals discussed include: 

• System Interface 

• Clock/Control Interface 

• Secondary Cache Interface 

• Interrupt Interface 

• Initialization Interface 

• JTAG Interface 

System Interface 

These signals comprise the interface between the R4000 and other 
components in the system. Signals IvdAck* and IvdErr* are available 
only on the R4000SC and MC. All other signals are available on all 
three package configurations. 

ExtRqst*: External request Input 

An external agent asserts ExtRqst* to request use 
of the system interface. The R4000 grants the 
request by asserting Release*. 

IvdAck*: Invalidate acknowledge Input 

An external agent asserts IvdAck* to signal 
successful completion of a processor invalidate 
or update request (R4000MC and SC only). 

IvdErr*: Invalidate error Input 

An external agent asserts IvdErr* to signal 
unsuccessful completion of a processor invalidate 
or update request (R4000MC and SC only). 
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Release* 



RdRdy»: 



SysAD(63:0): 

SysADC(7:0): 

SysCmd(8:0): 



SysCmdP: 



Validln* 



ValidOut*: 



WrRdy 



*. 



Release interface Output 

In response to the assertion of ExtRqst*, the 
R4000 asserts Release' to signal the requesting 
device that the system interface is available. 

Read ready I«P«t 

The external agent asserts RdRdy* to indicate 
that it can accept processor read, invalidate, or 
update requests in both secondary-cache and no- 
secondary-cache mode or can accept a read 
followed by write request, a read followed by a 
potential update request, or a read followed by a 
potential update followed by a write request in 
secondary cache mode. 

System address/data bus Input/Output 

A 64-bit address and data bus for communication 
between the processor and an external agent 
System address/data check bus Input/Output 
An 8-bit bus containing check bits for the SysAD 
bus. 

System command/data identifier bus parity 

Input/Output 
A 9-bit bus for command and data identifier 
transmission between the processor and an 
external agent 

System command/data identifier bus parity 

Input/Output 

A single, even-parity bit for the SysCmd bus. 

Valid input Input 

An external agent asserts Validln* when it is 

driving a valid address or data on the SysAD bus 

and a valid command or data identifier on the 

SysCmd bus. 

Valid output Output 

The R4000 asserts ValidOut* when it is driving a 

valid address or data on the SysAD bus and a 

valid command or data identifier on the SysCmd 

bus. 

Write ready Input 

An external agent asserts WrRdy* when it can 
accept a processor write request. 
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Clock/Control Interface 



These signals comprise the interface for docking and maintenance 
functions. 



IOOub 



IOIn: 

MasterClock: 

MasterOut: 
RClock(l:0): 

SyncOut: 



Syncln: 
TClock(l:0): 

Fault*: 

Status(7:0): 

VccP: 



I/O output Output 

Output slew rate control feedback loop output 
Must be connected to IOIn through a delay loop 
that models the IO path from the R4000 to an 
external agent 

I/O input Input 

Output slew rate control feedback loop input (see 
IOOut). 

Master clock Input 

Master clock input establishes the processor 
operating frequency. 

Master clock out Output 

Master clock output aligned with MasterClock. 

Receive clocks Output 

Two identical receive clocks that establish the 
system interface frequency. 

Synchronization clock out Output 

Synchronization clock output Must be connected 
to Syncln through an interconnect that models 
the interconnect between MasterOut, TClock, 
RClock, and the external agent 
Synchronization clock in Input 

Synchronization clock input 

Transmit clocks Output 

Two identical transmit clocks that establish the 
system interface frequency. 
Fault Output 

The R4000 asserts Fault* to indicate a mismatch 
output of boundary comparators. 

Status Output 

An 8-bit bus that indicates the current operation 

status of the processor. 

Quiet VCC for PLL Input 

Quiet Vcc for the internal phase locked loop. 
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VccSense: 



VssP: 
VssSense: 



VCC sense Input/Output 

This is a special pin used only in component 
testing and characterization. It provides a 
separate, direct connection from the on-chip VCC 
node to a package pin without attaching to the 
in-package power planes. Test fixtures treat 
VccSense as an analog output pin: the voltage at 
this pin directly shows the behavior of the on- 
chip VCC. Thus, characterization engineers can 
easily observe the effects of di/dt noise, 
transmission line reflections, etc. VccSense 
should be connected to VCC in functional system 
designs. 

Quiet VSS for PLL Input 

Quiet Vss for the internal phase locked loop. 
VSS sense Input/Output 

VssSense provides a separate, direct connection 
from the on-chip VSS node to a package pin 
without attaching to the in-package ground 
planes. VssSense should be connected to VSS in 
functional system designs. 



Secondary Cache Interface 



These signals comprise the interface between the R4000 and the 

secondary cache. These signals are available only on the R4000MC and 

SC. 

SCAddr(17:l): Secondary cache address bus Output 

SCAddrOW: Secondary cache address lsb Output 

SCAddrOX: Secondary cache address lsb Output 

SCAddrOY: Secondary cache address lsb Output 

SCAddrOZ: Secondary cache address lsb Output 

The 18-bit address bus for the secondary cache. 
Bit has four output lines to provide additional 
drive current. 
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SCAFax&O): 



SCData(127K»: 



SCDChk<15:0): 



SCDCS' 



SCOE*: 



;*• 



SCTag<24:0): 



SCTChk(6:0): 



SCTCS*: 

SCWrW*: 

SCWrX*: 

SCWrY*: 



Secondary cache address parity busOutput 
A 3-bit bus that carries the parity of the SCAddr 
bus and the cache control lines SCWR*, SCDCS* 
and SCTCS*. The individual bit definitions are: 
SCAPar2 - Even Parity for SCAddr(17:12) and 

SCWR* 
SCAParl - Even Parity for SCAddr(ll:6) and 

SCDCS* 



SCAParO 
SCTCS* 



Even Parity for SCAddr(5:0) and 



Input/ 



Secondary cache data bus 

Output 

A 128-bit bus used to read or write cache data 

from and to the secondary cache data RAM. 

Secondary cache data ECC bus Input/Output 

A 16-bit bus that carries two 8-bit ECC field 

covering the 128 bits of SCData from/to 

secondary cache. SCDChk(15:8) corresponds to 

SCData(127:64) and SCDChk(7K)) corresponds to 

SCData(63:0). 

Secondary cache data chip select Output 

Chip select enable signal for the secondary cache 

data RAM. 

Secondary cache output enable Output 

Output enable for the secondary cache data and 

tag RAM. 

Secondary cache tag bus Input/Output 

A 25-bit bus used to read or write cache tags 

from and to the secondary cache. 

Secondary cache tag ECC bus Input/Output 

A 7-bit bus that carries an Error Checking and 

Correcting (ECC) field covering the SCTag from 

and to the secondary cache. 

Secondary cache tag chip select Output 

Chip select enable signal for the secondary cache 

tag RAM. 

Secondary cache write enable Output 

Secondary cache write enable Output 
Secondary cache write enable Output 
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SCVVrZ*: Secondary cache write enable Output 

Write enable for the secondary cache data and 
tag RAM. 
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Interrupt Interface 

These signals comprise the interface used by external agents to 
interrupt the R4000 processor. Int*(5:l) is available only on the 
R4000PC; lnt*(0) and NMI* are available on all three configurations. 

Int»(5:l): Interrupt I*P ut 

Five of six general processor interrupts, bit-wise 
ORed with bits 5:1 of the interrupt register. 

Int*(0): Interrupt Input 

One of six general processor interrupts, bit-wise 
ORed with bit of the interrupt register. 

NMI*: Non-maskable interrupt Input 

Non-maskable interrupt, ORed with bit 6 of the 
interrupt register. 

Initialization Interface 

These signals comprise the interface by which an external agent 
initializes the R4000 operating parameters. All of these signals are 
available on all three processor configurations. 

ColdReset*: Cold reset Input 

This signal must be asserted for a power on reset 
or a cold reset The clocks SClock, TClock, and 
RClock begin to cycle and are synchronized with 
the de-assertion edge of ColdReset*. ColdReset* 
must be deasserted synchronously with 
MasterOut. 

ModeClock: Boot mode clock Output 

Serial boot-mode data clock output at the system 
clock frequency divided by two hundred and 
fifty six. 

Modeln: Boot mode data in Input 

Serial boot-mode data input. 

Reset*: Reset In P ut 

This signal must be asserted for any reset 
sequence. It may be asserted synchronously or 
asynchronously for a cold restt, or 
synchronously to initiate a warm reset Reset* 
must be de-asserted synchronously with 
MasterOut. 
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VCCOk: VCC is OK Input 

When asserted, this signal indicates to the R4000 
that the +5 volt power supply has been above 
4.75 volts for more than 100 milliseconds and will 
remain stable. The assertion of VCCOk initiates 
the initialization sequence. 



JTAG Interface 



These signals comprise the interface by which the JTAG boundary 
scan mechanism is provided. 

JTDI: JTAG data in Input 

Data is serially scanned in through this pin. 

JTCK: JTAG dock input Input 

The R4000 outputs a serial clock on JTCK. On the 
rising edge of JTCK both JTDI and JTMS are 

sampled. 
JTDO: JTAG data out Output 

Data is serially scanned out through this pin. 

JTMS: JTAG command Input 

JTAG command signal, signals that the incoming 
serial data is command data. 
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Table 8-1: R4000 SC/MC Processor Signal Summary 



Description 



Secondary cache 
Secondary cache 
Secondary cache 
Secondary cache 
Secondary cache 
Secondary cache 
Secondary cache 
Secondary cache 
Secondary cache 
Secondary cache 
Secondary cache 
Secondary cache 
Secondary cache 
Secondary cache 
Secondary cache 
Secondary cache 
Secondary cache 



data bus 
data ECC bus 
tag bus 
tag ecc bus 
address bus 
address Isb 
address Isb 
address Isb 
address Isb 
address parity bus 
output enable 
write enable 
write enable 
write enable 
write enable 
data chip select 
tag chip select 



System address/data bus 

System address/data check bus 

System command/data identifier bus 

System command/data identifier bus parity 

Valid input 

Valid output 

External request 

Release interface 

Read ready 

Write ready 

Invalidate acknowledge 

Invalidate error 

Interrupt 
Non-maskable interrupt 

Boot mode data in 
Boot mode dock 

JTAG data in 
JTAG data out 
JTAG command 
JTAG clock input 



Name 



SCData(127:0) 

SCDChk(15:0) 

SCTag(24:0) 

SCTChk(6:0) 

SCAddr(17:1) 

SCAddrOZ 

SCAddrOX 

SCAddrOX 

SCAddrOW 

SCAPar(2:0) 

SCOE* 

SCWrZ* 

SCWrY* 

SCWrX* 

SCWrW* 

SCDCS* 

SCTCS* 

SysAD(63:0) 

SysADC(7:0) 

SysCmd(8:0) 

SysCmdP 

Validln* 

ValidOut* 

ExtRqst* 

Release* 

RdRdy* 

WrRdy* 

IvdAck* 

IvdErr* 

lnt*(0) 
NMI* 

Modeln 
ModeClock 

JTDI 
JTDO 
JTMS 
JTCK 



I/O 



I/O 
I/O 
I/O 
I/O 

o 
o 
o 
o 
o 
o 
o 
o 
o 
o 
o 
o 
o 

I/O 
I/O 
I/O 
I/O 

I 

o 

I 

o 

I 

I 



Asserted 
State 



I 
O 

I 

o 



High 

High 

High 

High 

High 

High 

High 

High 

High 

High 

Low 

Low 

Low 

Low 

Low 

Low 

Low 

High 



3-State 



High 
High 
Low 
Low 
Low 
Low 
Low 
Low 
Low 
Low 

Low 
Low 

High 
High 

High 
High 
High 
High 



Yes 
Yes 
Yes 
Yes 

No 

No 

No 

No 

No 

No 

No 

No 

No 

No 

No 

No 

No 

Yes 
Yes 
Yes 
Yes 

No 
Yes 
No 
No 
No 
No 
No 
No 

No 

No 

No 
No 

No 
No 
No 
No 
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Description 



Transmit clocks 

Receive clocks 

Master clock 

Master clock out 

Synchronization clock out 

Synchronization clock in 

I/O output 

I/O input 

VCC is OK 

Cold reset 

Reset 

Fault 

Quiet VCC for PLL 

Quiet VSS for PLL 

Status 
VCC sense 
VSS sense 



Name 



TCIock(1:0) 

RCIock(1:0) 

MasterClock 

MasterOut 

SyncOut 

Syncln 

lOOut 

On 

VCCOk 

ColdReset* 

Reset* 

Fault* 

VccP 

VssP 

Status(7:0) 

VccSense 

VssSense 



I/O 



o 

o 

I 

o 

o 

I 

o 

I 

I 

I 

I 

o 

I 

I 

o 

I/O 
I/O 



Asserted 
State 



High 
High 
High 
High 
High 
High 
High 
High 
High 
Low 
Low 
Low 
High 
High 

High 
N/A 
N/A 



3-State 



No 
No 
No 
No 
No 
No 
No 
No 
No 
No 
No 
No 
No 
No 

No 
No 
No 
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Table 8-2: R4000 PC Processor Signal Summary 



Description 



System address/data bus 

System address/data check bus 

System command/data identifier bus 

System command/data identifier bus parity 

Valid input 

Valid output 

External request 

Release interface 

Read ready 

Write ready 

Interrupts 

Interrupt 

Non-maskable interrupt 

Boot mode data in 
Boot mode clock 

JTAG data in 
JTAG data out 
JTAG command 
JTAG clock input 

Transmit clocks 

Receive clocks 

Master clock 

Master clock out 

Synchronization clock out 

Synchronization clock in 

I/O output 

I/O input 

VCC is OK 

Cold reset 

Reset 

Fault 

Quiet VCC for PLL 

Quiet VSS for PLL 



Name 



SysAD(63:0) 

SysADC(7:0) 

SysCmd(8:0) 

SysCmdP 

Vaiidln* 

ValidOut* 

ExtRqst* 

Release* 

RdRdy* 

WrRdy* 

lnt*(5:1) 

lnt*(0) 

NMI* 

Modeln 
ModeClock 

JTDI 
JTDO 
JTMS 
JTCK 

TCIock(1:0) 

RCIock(1:0) 

MasterClock 

MasterOut 

SyncOut 

Syncln 

lOOut 

lOln 

VCCOk 

ColdReset* 

Reset* 

Fault* 

VccP 

VssP 



I/O 



I/O 
I/O 
I/O 
I/O 

I 

o 

o 

o 

I 

I 

I 



I 
o 

I 

o 
I 

I 

o 
o 

I 

o 
o 

I 

o 

I 

I 

I 

I 

o 

I 



Asserted 
State 



High 
High 
High 
High 
Low 
Low 
Low 
Low 
Low 
Low 

Low 

Low 
Low 

High 
High 

High 
High 
High 
High 

High 
High 
High 
High 
High 
High 
High 
High 
High 
Low 
Low 
Low 
High 
High 



3-State 



Yes 
Yes 
Yes 
Yes 

No 
Yes 

No 

No 

No 

No 

No 
No 
No 

No 
No 

No 
No 
No 
No 

No 
No 
No 
No 
No 
No 
No 
No 
No 
No 
No 
No 
No 
No 
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The system interface allows the processor to access those external 
resources required to satisfy cache misses, while also permitting an 
external agent access to certain of the processor's internal resources. 
In the R4000MC configuration, the system interface provides those 
processor mechanisms necessary to maintain the cache coherency of 
shared data, while also providing to an external agent the mechanisms 
with which to maintain system-wide multiprocessor cache coherency. 
This section describes the system interface from the point of view of 
both the processor and the external agent 

System Events 

First, a definition: A system event is an event that occurs within the 
processor and requires access to external system resources. 
When a system event occurs, the processor issues either a single 
request or a series of requests— called processor requests— through the 
system interface, in order to access an external resource and service 
the event For this to work, the processor's system interface must be 
connected to an external agent that is compatible with the system 
interface protocol, and that can coordinate access to system resources. 

System events may be: 

• A load that misses in both the primary and secondary 
caches. 

• A store that misses in both the primary and secondary 
caches. 

• A store that hits in either the primary or secondary data 
cache on a shared line, and an uncached load or store. 
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Note that a miss in both caches requires the write back to memory of 
the cache line being replaced, if the line is in a dirty cache state. 
Under certain conditions, system events are also caused by cache 
operation instructions. 

Two types of system events are described: processor requests and 
external requests. 

Processor Requests 

A processor request is a request or a series of requests, through the 

system interface, to access some external resource. 

Processor requests include read, write, null write, invalidate and 

update. 

• Read is a request for a block, double word, word, or partial 
word of data either from main memory or from another 
system resource. 

• Write provides a block, double word, word, or partial word 
of data to be written either to main memory or to another 
system resource. 

• Null write indicates that an expected write has been 
cancelled as a result of an external request. 

• Invalidate is a request to invalidate a specified cache line in 
every other cache in the system. 

• Update is a request to update every other cache in the 
system with the specified double word, word, or partial 
word of data. 

External Requests 

An external agent requesting access to processor caches or to a 
processor status register generates an external request. This access 
request passes through the system interface. 

External requests include read, write, invalidate, update, snoop, 
intervention, and null requests. External invalidate, update, snoop 
and intervention requests, as a group, are referred to as external 
coherence requests. 

• Read is a request for a word of data from a processor 
internal resource. 

• Write provides a word of data to be written to a processor 
internal resource. 
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• Invalidate specifies a cache line, in the processor's primary 
and secondary caches, that must be marked invalid. 

• Update provides a double word, word, or partial word of 
data to be written to the processor's primary and 

secondary caches. 

• Snoop checks the processor secondary cache to see if a valid 
copy of a particular cache line exists; if the valid copy 
exists, it then checks to see what cache state the line is in. 
The processor returns the state of the cache line at the 
specified physical address in the secondary cache, and may 
modify the state of the cache line. 

• Intervention requires the processor to return an indication 
of the state of the cache line at the specified physical 
address in the secondary cache. Under certain conditions 
related to the state of the cache line and the nature of the 
intervention request, the contents of the primary and 
secondary cache line may themselves be returned, or the 
state of the line may itself be modified. 

• Null requests require no action by the processor. They 
simply provide a mechanism for an external agent to either 
return control of the secondary cache to the R4000, or to 
return the system interface to the master state without 
affecting the processor. 

Read Requests 

There are two types of read requests: processor and external. When a 
processor or an external agent receives a read request, it must access 
the specified resource and return the requested data. 

• A processor read request may be split from the external 
agenf s return of the requested data; the response 
(requested) data may be returned at any time after the read 
request, provided the system interface bus is not being 
used. An external agent may even initiate an unrelated 
external request before it returns the response data for a 
processor read. A processor read request is complete after 
the last word of response data has been received from the 
external agent. 
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• For external read requests, the data is returned directly in 
response to the read request External read requests may 
not be split from the return of response data. An external 
read request is complete after the processor returns the 
requested word of date. 

Pending Read Requests 

Processor read requests that have been issued, but for which data has 
not yet been returned, are said to be pending. A read remains pending 
until the requested read data is returned. 

Read Responses 

The return of data in response to a processor read request is 
accomplished through a read response. While a read response is 
technically an external request, read responses has a characteristic that 
makes it differ from all other external requests— system interface 
arbitration is not performed. For this reason, read responses are 
handled separately from all other external requests, and are simply 
called read responses. 

Write Requests 

A processor write request is complete after the last word of data has 
been transmitted. An external write request is complete after the word 
of data has been transmitted. 

Update and Invalidate Requests 

A processor update request requires a completion acknowledge by the 
invalidate acknowledge signal IvdAck* or the invalidate error signal 
IvdErr* —unless the update is canceled by the external agent 
Update cancellation is signaled to the processor during external 
invalidate, update, snoop, and intervention requests; IvdErr* is used 
to signal that a processor update request has failed. When the 
processor update request fails, the issuing processor takes a bus error 
on the store instruction that generated the failed request. 
Since the completion acknowledge for processor invalidate and 
update requests is signaled through the system interface on dedicated 
pins, the completion acknowledge may occur in parallel with 
processor and external requests. A processor update request that has 
been submitted, but for which the processor has not yet received an 
acknowledge or a cancellation, is said to be unacknowledged. 
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An external update request is complete after the request has been 
transmitted. 

Snoop Requests 

An external snoop request is complete after the processor returns the 
state of the specified cache line. 

Intervention Requests 

An external intervention request is complete after the processor 
returns the state of the specified cache line, if the processor does not 
return the contents of the cache line, or after the processor returns the 
last word of data for the specified cache line. 
Note that the data identifier associated with the response data may 
signal mat the returned data is erroneous, causing the processor to 
take a bus error. 

Flow Control for Requests 

The processor must manage the flow of processor requests and 
external requests. The processor controls the flow of external requests 
by the external request arbitration signals ExtRqst*, and Release*. An 
external agent must acquire mastership of the system interface before 
it is allowed to issue an external request The external agent arbitrates 
for mastership of the system interface by asserting ExtRqst* and 
waiting for the processor to assert Release* for one cycle. Mastership 
of the system interface is always returned to the processor after an 
external has been issued. The processor will not accept a subsequent 
external request until it has completed the current one. 
Processor requests are managed by the processor in two distinct 
modes: secondary-cache mode and no-secondary-cache mode. These modes 
are programmable through the boot-time mode control interface 
described in Chapter 12. The allowed modes of operation are 
dependent on the package configuration for the processor. A 
processor in the small configuration package must be programmed to 
ran in no-secondary-cache mode. A processor in the large 
configuration package may be programmed to run in secondary-cache 
or no-secondary-cache mode. If not programmed appropriately, the 
behavior of the processor is undefined. 

In no-secondary-cache mode, the processor will issue requests in a 
strict sequential fashion; that is, the processor is only allowed to have 
one request pending at any time. The processor will issue a read 
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request and wait for a read response before issuing any subsequent 
requests. The processor will submit a write request only if there are no 
reads pending. 

The processor provides the input signals RdRdy* and WrRdy* to 
allow an external agent to manage the flow of processor requests. 
RdRdy* controls the flow of processor read, invalidate, and update 
requests while WrRdy* controls the flow of processor write requests. 
Processor null write requests must always be accepted and cannot be 
delayed by either RdRdy* or WrRdy*. The processor samples the 
signal RdRdy* to determine the issue cycle for a processor read, 
invalidate, or update request and the processor samples the signal 
WrRdy* to determine the issue cycle of a processor write request The 
issue cycle for a processor read, invalidate, or update request is 
defined to be the first address cycle for the request for which the signal 
RdRdy* was asserted two cycles previously. The issue cycle for a 
processor write request is defined to be the first address cycle for the 
write request for which the signal WrRdy* was asserted two cycles 
previously. If the processor wishes to issue a request but is unable to 
because one of the signals RdRdy* or WrRdy* is deasserted, the 
processor will repeat the address cycle for the request until the issue 
cycle is accomplished. Once the issue cycle is accomplished, data 
transmission will begin for a request that includes data. There will 
always be one and only one issue cycle for any processor request 
The processor will accept external requests while attempting to issue 
a processor request by releasing the system interface to slave state in 
response to an assertion of ExtRqst*. Note that the rules governing the 
issue cycle of a processor request are strictly applied to determine the 
action the processor is taking. The processor will either accomplish the 
issue of the processor request, in which case the processor request will 
be completed in its entirety before an external request will be 
accepted, or the processor will release the system interface to slave 
state without accomplishing the issue of the processor request. In the 
latter case, the processor will attempt to issue the processor request 
again after the external request is completed, and the rules governing 
issue cycle will again apply. 

In no-secondary-cache mode an external agent must be capable of 
accepting a processor read request at any time there are no processor 
read requests pending and the signal RdRdy* has been asserted for 
two or more cycles. An external agent must be capable of accepting a 
processor write request at any time there are no processor read 
requests pending and the signal WrRdy* has been asserted for two or 
more cycles. 
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In secondary-cache mode, the processor issues requests both 
individually as in no-secondary-cache mode and in groups that begin 
with a processor read request called clusters. A cluster consists of a 
processor read request followed by one or two additional processor 
requests issued while the read request is pending. All of the requests 
that are part of a cluster must be accepted before the response to the 
read request that begins a cluster may be returned to the processor, 
cluster can include: 

• a processor read request, followed by a write request 

• a processor read request, followed by potential update 

• a processor read request, f ollowed by a potential update request, 
followed by a write request 

The issue of potential update requests within a cluster can be disabled 
via the boot-time mode control interface. A processor potential update 
request is defined as any update request that is issued while a 
processor read request is pending. In addition, a bit in the command 
for processor updates identifies potential updates. Potential updates 
are issued in conjunction with a processor read request That is, once 
the processor accomplishes the issue of a read request, a potential 
update request follows if one is required regardless of the state of 
RdRdy*. Potential update requests do not obey the RdRdy* flow 
control rules for issue, but rather issue with a single address cycle 
regardless of the state of RdRdy*. 

A write request that is part of a cluster does obey the WrRdy* rules for 
issue. The processor accepts external requests between the issue of a 
processor read request, or a processor read request followed by a 
potential update request and the issue of a processor write request 
within a cluster. The processor signals that it is issuing a cluster that 
contains a processor write request by issuing a read-with-write- 
forthcoming request instead of an ordinary read request to start the 
cluster. The read-with-write-forthcoming request is identified by a bit 
in the command for processor read requests. The external agent must 
accept all of the requests that form a cluster before it may return a 
response to the read request that began the cluster. The behavior of the 
processor is undefined if the external agent returns a response to a 
processor read request that begins a cluster before accepting all of the 
requests that form the cluster. 

Since the processor does accept external requests between the issue of 
a read-with-write-forthcoming request that begins a cluster and the 
issue of the write request that completes a cluster, it is possible for an 
external request to obviate the need for the write request within the 
cluster. For instance, if the external agent issued an external invalidate 
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request that targeted the cache line the processor was attempting to 
write back, the state of the cache line would be changed to invalid, and 
the write back for the cache line would no longer be needed. In this 
event, the processor issues a processor null write request after 
completing the external request to complete the cluster. Processor null 
write requests do not obey the WrRdy* flow control rules for issue, 
but rather issue with a single address cycle regardless of the state of 
WrRdy*. Any external request that changes the state of a cache line 
from dirty exclusive or dirty shared to clean exclusive, shared, or 
invalid obviates the need for a write back of that cache line. 
A processor potential update request remains potential until the 
response to the pending processor read request that began the cluster 
is received. If the read response data is returned in one of the shared 
states, shared or dirty shared, the potential update is no longer 
potential and must receive an acknowledge via either the signal 
IvdAck* or IvdErr*. If the read response data is returned in one of the 
exclusive states, clean exclusive or dirty exclusive, the potential 
update is nullified and the processor neither expects nor requires an 
acknowledge. 

In secondary-cache mode, an external agent must be capable of 
accepting a processor read request followed by a potential update 
request any time there are no processor read requests pending, no 
unacknowledged processor update requests, and the signal RdRdy* 
has been asserted for two or more cycles. An external agent must be 
capable of accepting a processor write request at any time there are no 
processor read requests pending, or there is a processor read-with- 
write-forthcoming request pending with no unacknowledged 
processor update requests that are compulsory, and the signal 
WrRdy* has been asserted for two or more cycles. 
After issuing a processor read request, the processor does not issue a 
subsequent read request until it has received a read response for the 
read request, whether the read request began a cluster or not After 
issuing a processor update request, or after a potential update request 
is no longer potential, the processor does not issue a subsequent 
request until it has received an acknowledge for the update request 
After the processor has issued a write request, the processor does not 
issue a subsequent request until at least four cycles after the issue cycle 
of the write request 

The following sections detail the sequence, protocol, and syntax of 
processor and external requests. Sequence refers to the precise series 
of requests that a processor generates to service a system event 
Protocol refers to the cycle-by-cycle signal transitions that occur on the 
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processor's system interface pins to realize a processor or external 
request Syntax refers to the precise definition of bit patterns on 
encoded buses such as the command bus. 

Processor Request Sequencing 

The processor generates a request or a series of requests through the 

system interface to satisfy system events. Processor requests are 
managed in two distinct modes, secondary-cache mode and no- 
secondary-cache mode. The following sections detail the sequence of 
requests generated by the processor for each system event in 
secondary-cache and no-secondary-cache mode. 

Primary and Secondary Cache Miss on a Load 

When the processor misses in both the primary and secondary caches 
on a load, it must obtain the cache line that contains the data element 
to be loaded from an external agent before it can proceed. If the new 
cache line will replace a current cache line that is in the state dirty 
exclusive or dirty shared, the current cache line must be written back 
before the new line can be loaded in the primary and secondary 
caches. 

The processor examines the coherency attribute in the TLB entry for 
the page mat contains the requested cache line, and, if the coherency 
attribute is exclusive, it issues a coherent read request that also 
requests exclusivity. If the coherency attribute is sharable or update, 
the processor issues a coherent read request, and, if the coherency 
attribute is noncoherent, the processor issues a noncoherent read 
request 

In no-secondary-cache mode, the processor issues a read request for 
the cache line that contains the data element to be loaded. The 
processor then waits for an external agent to provide the read data in 
response to the read request Then, if the current cache line must be 
written back, the processor issues a write request for the current cache 
line. 

In secondary-cache mode, if the current cache line does not need to be 
written back and the coherency attribute for the page that contains the 
requested cache line is anything other than exclusive, the processor 
issues a read request for the cache line that contains the data element 
to be loaded. If the current cache line needs to be written back and the 
coherency attribute for the requested cache line is not exclusive, the 
processor will issue a cluster consisting of a read-with-write- 
f orthcoming request for the cache line that contains the data element 
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to be loaded followed by a write request for the current cache line. If 
the current cache needs to be written back, and the coherency attribute 
for the page containing the requested cache line is exclusive, the 
processor issues a cluster consisting of an exclusive read-with-write- 
f orthcoming request, followed by a write request for the current cache 
line. 

Primary and Secondary Cache Miss on a Store 

When the processor misses in both the primary and secondary caches 
on a store, it must obtain the cache line that contains the target location 
of the store from an external agent before it can proceed. In secondary 
cache mode, if the new cache line replaces a current cache line that is 
in the state dirty exclusive or dirty shared, the current cache line must 
be written back before the new line can be loaded in the primary and 
secondary caches. 

The processor examines the coherency attribute in the TLB entry for 
the page that contains the requested cache line to see if this cache line 
is being maintained with a write invalidate or a write update cache 
coherency protocol. If the coherency attribute is sharable or exclusive, 
a write invalidate protocol is in effect, and a coherent read that also 
requests exclusivity is issued. If the coherency attribute is update, a 
write update protocol is in effect and a coherent read request is issued. 
If the coherency attribute is noncoherent, a noncoherent read request 
is issued. 

In no-secondary-cache mode, the processor issues a read request for 
the cache line that contains the data element to be loaded. The 
processor then waits for an external agent to provide the read data in 
response to the read request Then, if the current cache line must be 
written back, the processor issues a write request for the current cache 
line. 

In secondary-cache mode, if the current cache line does not need to be 
written back and the coherency attribute for the page that contains the 
requested cache line is noncoherent, the processor issues a read 
request for the cache line that contains the target location of the store. 
If the current cache line does not need to be written back and the 
coherency attribute for the page that contains the requested cache line 
is sharable or exclusive, the processor issues a read request. If the 
current cache line does not need to be written back andthe coherency 
attribute for the page that contains the requested cache line is update, 
and potential updates are enabled, the processor issues a cluster 
consisting of a read request followed by a potential update request If 
the current cache line needs to be written back, and the coherency 
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attribute for the requested cache line is noncoherent, the processor 
issues a cluster consisting of a read-with-write-forthcoming request 
for the cache line that contains the target location of the store followed 
by a write request for the current cache line. If the current cache line 
needs to be written back and the coherency attribute for the page that 
contains the requested cache line is sharable or exclusive, the 
processor issues a cluster consisting of a read-with-write-forthcoming 
request followed by a write request for the current cache line. If the 
current cache line needs to be written back and the coherency attribute 
for the page that contains the requested cache line is update, and 
potential updates are enabled, the processor issues a cluster consisting 
of a read-with-write-forthcoming request followed by a potential 
update request followed by a write request for the current cache line. 
If the processor issues a cluster that contains a potential update, and 
the response data for the read request is returned with an indication 
that it must be placed in the cache in a shared state, either shared or 
dirty shared, the potential update becomes compulsory. Once a 
potential update becomes compulsory, the external agent must 
forward the update to the system, and signal an acknowledge to the 
processor when the update is complete. In this case the processor will 
not complete the store until the update has been acknowledged. 
If the processor issues a cluster that contains a potential update, and 
the response data for the read request is returned in an exclusive state, 
clean exclusive or dirty exclusive, the potential update is nullified. 
Once a potential update has been nullified, the external agent must 
simply discard the update. The processor will not wait for or expect an 
acknowledge to a potential update that has been nullified. 
If the processor issues a read request, or a cluster that does not contain 
a potential update, and the response data for the read request is 
returned with an indication that it must be placed in the cache in a 
shared state, either shared or dirty shared, the processor will then 
issue an invalidate request or an update request depending on the 
coherency attribute for the page that contains the target location of the 
store instruction. If the coherency attribute is update, an update 
request is issued, otherwise an invalidate request is issued. The 
external agent must forward the update to the system and signal an 
acknowledge to the processor for the update request. The processor 
will not complete the store until it has received an acknowledge for the 
update request 

The concept of potential updates is introduced to provide the external 
agent a chance to use the system bus more efficiently. In an update 
protocol, it is quite likely that a cache line requested by a processor 
coherent read request will be returned in a shared state, and that the 
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processor will then have to issue an update request before it can 
complete a store instruction. The potential update issued with the 
read request in a cluster allows the external agent to anticipate the 
read response on the system bus, and, if it arrives with an indication 
that it is shared, to quickly gain control of the system bus and transmit 
the required update to the rest of the system. This provides the 
processor with the acknowledge as quickly as possible and also allows 
the processor to complete the store instruction as quickly as possible. 
Without the potential update request, the response data must be 
returned to the processor. The processor then issues an update request 
which must then be forwarded to the system bus before an 
acknowledge can be returned to the processor. 
Note that potential updates behave in all cases as if they have not yet 
been issued by the processor. Potential updates are not subject to 
cancellation, and do not expect or require an acknowledge. When a 
potential update is nullified, the processor behaves as if no update 
request was ever issued. When a potential update is no longer 
potential, the processor behaves as if it had issued an update request 
at that instant Once a potential update is no longer potential it is 
subject to cancellation, and the processor requires an acknowledge for 
the update request 

Secondary Cache Hit on a Store to a Shared Line 

When the processor hits in the secondary on a cache line that is 
marked shared or dirty shared, the processor must issue an update 
request and wait to receive an acknowledge before the store can be 
completed. The processor checks the coherency attribute in the TLB 
for the page that contains the cache line that is the target of the store 
to determine if the cache line is being managed using a write 
invalidate or write update cache coherency protocol. If the coherency 
attribute is sharable or exclusive, a write invalidate protocol is in effect 
and the processor issues an invalidate request If the coherency 
attribute is update, a write update protocol is in effect and the 
processor issues an update request The processor will not complete 
the store until an external agent signals an acknowledge for the update 
request. 

Uncached Load or Store 

When the processor performs an uncached load, it issues a 
noncoherent read request When the processor performs an uncached 
store, it issues a write request 
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Cache Operations 



The processor provides a variety of cache operations for use in 
maintaining the state and contents of the primary and secondary 
caches. During the execution of the cache operation instructions, the 
processor may issue write requests or invalidate requests. 



External Request Handling 



An external agent must arbitrate with the processor for access to the 
system interface before it can issue an external request The external 
agent signals that it wishes to begin an external request and waits for 
the processor to signal that it is ready to accept the request before 
issuing any new external requests. Based on its internal state and the 
current state of the system interface, the processor decides when to 
accept a new external request The processor signals that it is ready to 
accept an external request based on the following criteria: 

1. If there are no processor requests pending, tiie processor de- 
cides, based on its internal state, whether to accept the exter- 
nal request, or to issue a new processor request The processor 
may issue a new processor request while the external agent is 
requesting access to the system interface to issue an external 
request 

2. The processor will accept an external request after completing 
a processor request or a processor request cluster that is in 
progress. 

3. While waiting for the assertion of RdRdy* to issue a processor 
read request, the processor will accept an external request 
provided that the request is delivered to the processor one or 
more cycles before RdRdy* is asserted. 

4. While waiting for the assertion of WrRdy* to issue a processor 
write request, the processor will accept an external request 
provided that the request is delivered to the processor one or 
more cycles before WrRdy* is asserted. 

5. While waiting for the response to a read request after the pro- 
cessor has made an uncompelled change to a slave state, an 



R4000 User's Manual-Preliminary 9 ' 13 



Chapter 9 



external agent may issue an external request before providing 
the read response data. 



Invalidate and Update Cancellation 

An external agent may discover that a processor request for an update 
cannot be completed based on state changes in the external system 
that have not yet been reflected into the processor's caches. An 
example of this in a bus-based system is when a processor issues an 
invalidate, but, before the external bus interface can transmit the 
invalidate, an invalidate is received from another processor that 
targets the same cache line. In this case, the processor's cache does not 
reflect the current state of the system, and the unacknowledged 
invalidate cannot be transmitted. When this occurs, the external agent 
must cancel the update. The processor, upon receiving a cancellation, 
will process any external requests that the external agent wishes to 
issue and then re-examine the state of the cache to determine what 
action to take. 

In the previous example, this would cause the processor to process an 
external request to invalidate the cache line that was the target of the 
store. The processor would then re-examine the state of the cache and 
discover that the cache line that was the target of the store is now 
invalid. Finally, the processor processes the store as a store miss and 
issues a read request instead of an invalidate request 

Potential updates may not be canceled until they become compulsory. 
Potential updates are issued within a cluster under pending reads and 

become compulsory after the read request is satisfied. In more general 
terms, an external requestthat indicates processor update cancellation 
may not be issued when a processor read is pending and may not be 
issued unless a compulsory update is unacknowledged. The behavior 
of the processor is undefined if the cancellation indication is signaled 
on an external coherence request to the processor while a processor 
read is pending or when there is no compulsory update 
unacknowledged. 

Load Linked Store Conditional Considerations 

Generally, the execution of a Load Linked Store Conditional 
instruction sequence is not visible at the system interface; that is, no 
special requests are generated due to the execution of this instruction 
sequence. 
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There is, however, one situation for which the execution of a Load 
Linked Store Conditional instruction sequence is visible as a change in 
the nature of a processor read request. Specifically, if the data location 
targeted by a Load Linked Store Conditional instruction sequence 
maps to the same cache line that the instruction area containing the 
Load Linked Store Conditional code sequence is mapped to, then 
immediately after executing the Load Linked instruction the cache 
line that contains the link location will be replaced by the instruction 
line containing the code. The link address is kept in a register separate 
from the cache and remains active as long as the link bit remains set. 
The link bit is set by the Load Linked instruction, and is cleared by any 
change of cache state for the cache line containing the link address, or 
a return from exception. 

In order for the Load Linked Store Conditional instruction sequence 
to work correctly, all coherency traffic targeting the link address must 
be visible to the processor, and the cache line containing the link 
location must remain in a shared state in every cache in the system. 
This guarantees that a Store Conditional executed by some other 
processor is visible to the processor as a coherence request which 
changes the state of the cache line that contains the link location. To 
accomplish this, a read request issued by the processor which causes 
the replacement of a cache line that contains the link location while the 
link bit is set will indicate that the link address is being retained. The 
link address retained bit in the command for the read request will be 
asserted to provide this indication. This informs the external agent 
that even though the processor has replaced this cache line and no 
longer has it present in its cache, it still must see any coherence traffic 
that targets this cache line. 

In addition, any snoop or intervention request that targets a cache line 
which is not present in the cache, but for which the snoop or 
intervention address matches the current link address while the link 
bit is set, will return an indication that the cache line is present in the 
cache in a shared state. A shared indication is returned even though 
the processor does not actually have the data content of the cache line. 
This is consistent since the processor never returns data in response to 
an intervention request for a cache line that is in the shared state. The 
shared response guarantees that the cache line that contains the link 
location will remain in a shared state in all other processor's caches, 
and therefore that any other processor attempting a store conditional 
to this link location must issue a coherence request in order to 
complete the store conditional. 
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Introduction 



The system interface protocol describes the cycle-by-cycle signal 
transitions that occur on the pins of the system interface to realize 
requests between the processor and an external agent 



The system interface is register to register. That is, processor outputs 
come directly from output registers and begin to change with the 
rising edge of SClock (SClock is an internal clock used by the 
processor to sample data at the system interface and to dock data into 
the processor's system interface output registers; see Chapter 10, 
"Clock/Control Interface" for more details), and processor inputs are fed 
directly to input registers that latch the inputs with the rising edge of 
SClock. Therefore, if an input to the processor is changed during a 
particular cycle in such a way that the new value is sampled at the end 
of the cycle, the earliest the processor can change one of its outputs in 
response to the input change is two cycles later. This methodology 
Was chosen to allow the system interface to run at the highest possible 
dock frequency. 

The primary communication paths for the system interface are a sixty- 
four bit address and data bus, SysAD(63:0) and a nine bit command 
bus, SysCmd<8:0). The SysAD bus and the SysCmd bus are bi- 
directional; that is, they are driven by the processor to issue a 
processor request, and by an external agent to issue an external 
request When the processor is driving the SysAD bus and the 
SysCmd bus, the system interface is in master state. When an external 
agent is driving the SysAD bus and the SysCmd bus, the system 
interface is in slave state. 

A request through the system interface consists of an address, a 
system interface command that specifies the precise nature of the 
request, and a series of data elements if the request is for a write, read 
response, or update. Addresses and data elements are transmitted or. 
the SysAD bus. System interface commands are transmitted on the 
SysCmd bus. 

Cydes in which the SysAD bus contains a valid address are called 
address cycles. Cycles in which the SysAD bus contains a valid data 
element are called data cycles. In master state the processor will assert 
the signal ValidOut* whenever the SysAD bus and the SysCmd bus 
are valid. In slave state an external agent will assert the signal 
Validln* whenever the SysAD bus and the SysCmd bus are valid. 
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The SysCmd bus is used to identify the contents of the SysAD bus 
during any cycle in which it is valid. The most significant bit of the 
SysCmd bus is always used to indicate whether the current cycle is an 
address cycle or a data cycle. During address cycles, the remainder of 
the SysCmd bus, SysCmd(7:0), contains a sustem interface command. 
The encoding of system interface commands is detailed in the section 
on system interface syntax. During data cycles, the remainder of the 
SysCmd bus, SysCmd(7iO), contains an indication of whether this is 
the last data element to be transmitted and other information about 
the data element The content of the SysCmd bus during data cycles is 
called a data identifier. The encoding of data identifiers is detailed in 
the section on system interface syntax. 
A request through the system interface consists of one or more 
identical address cycles, followed by a series of data cycles for 
requests that include data. The most efficient request through the 
system interface consists of a single address cycle f ollowed by a single 
data cycle or a number of data cycles sufficient to transmit a block of 
data. 

System Interface Arbitration 

When an external agent needs to issue an external request through the 
system interface, it must first get the system interface into slave state. 
The transition from master state to slave state is arbitrated by the 
processor using the system interface handshake signals ExtRqst* and 
Release*. An external agent signals that it wishes to issue an external 
request by asserting ExtRqst*. Then, when the processor is ready to 
accept an external request, it releases the system interface from master 
state to slave state by asserting Release* for one cycle. The system 
interface returns to master state as soon as the issue of the external 
request is completed. Having asserted ExtRqst*, an external agent 
may not de-assert ExtRqst* until the processor asserts Release*. After 
the processor asserts Release* for one cycle, the external agent must 
deassert ExtRqst* two cycles after the assertion of Release*. An 
external agent may continue to assert ExtRqst* if another external 
request follows the current request After the first external request 
completes, the processor must assert Release* again before the second 
external request is issued to the processor. 
The system interface will remain in master state until the external 
agent requests and is granted the system interface or until the 
processor issues a read request, or completes the issue of a cluster. 
Whenever a processor read request is pending, after the issue of a read 
request or after the issue of all of the requests in a cluster, the processor 
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will switch the system interface to slave state even though the external 
agent is not arbitrating to issue an external request This transition to 
slave state is specifically to allow the external agent to return read 
response data. The external agent must not assert the signal ExtRqst* 
for the purposes of returning read response data. ExtRqst* should 
only be asserted when the external agent needs to get the system 
interface into slave state to issue an external request 

The signal ExtRqst* is only used to arbitrate for the system interface, 
that is to request the transition of the system interface from master 
state to slave state. ExtRqst* must always de-assert two cycles after a 
cycle in which Release* is asserted unless the external agent wishes to 
perform a subsequent external request ExtRqst* must not be asserted 
while the system interface is in slave state unless the external agent 
wishes to perform a subsequent external request 

The transition of the system interface from master state to slave state 
initiated by the processor when a processor read request is pending 
will be referred to as an uncompeUed change to slave state. An 
uncompelled change to slave state will occur during or some number 
of cycles after the issue cycle of a read request or the last cycle of the 
last request in a cluster. The number of cycles depends on the state of 
the cache, the presence of a secondary cache and the secondary cache 

parameters. After an uncompeUed change to slave state, the system 
interface remains in slave state until the external agent issues an 
external request After the external request, the system interface will 
return to master state. An external agent must note that the processor 
has performed an uncompelled change to slave state and begin 
driving the address and data bus and the command bus. As long as the 
system interface is in slave state, the external agent can begin an 
external request without arbitrating for the system interface; that is, 
without asserting ExtRqst*. 

System Interface Request Descriptions 

The following sections illustrate, through the use of text and detailed 
timing diagrams, the protocol of each processor and external request 
The timing diagrams use abbreviations to show the contents of 
encoded busses during cycles in which they are defined. Following is 
a list of abbreviations and definitions used for each bus. 

Global: 

Unsd - Unused. 

SysAD bus: 

Addr- Physical address. 

Data<n>- Data element number n of a block of data. 
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SysCmd bus: 

Cmd - An unspecified system interface command. 

Read - A read request command. 

RwWF- A read-with-write-forthcoming request command. 

Write- A write request command. 

Null - A null request command. 

SINull - A system interface release null request command. 

SCNull- A secondary cache release null request command. 

Ivd - An invalidate request command. 

Upd- An update request command. 

Ivtn - An intervention request command. 

Snoop— A snoop request command. 

NData - A noncoherent data identifier for a data element other 

than the last data element 
NEOD - A noncoherent data identifier for the last data 

element 
CData- A coherent data identifier for a data element other 

than the last data element 
CEOD- A coherent data identifier for the last data element 

Two closely spaced wavy vertical lines in a timing diagram indicate a 
repetition of the current cycle. That is, the cycle broken by the wavy 
lines may represent one or more identical cycles. This is referred to as 
a break in the timing diagram and is used to keep the timing diagrams 
concise and readable. 

Invalidate and Update Acknowledge Protocol 

Processor invalidate and update requests are acknowledged using the 
signals IvdAck* and IvdErr*. An external agent drives either IvdAck* 
or IvdErr* for one cycle to signal the completion status of the current 
processor update request, update request acknowledge occurs in 
parallel with requests on the SysAD and SysCmd buses. IvdAck* or 
IvdErr* may be driven at any time after a processor update request is 
issued provided that the update request is compulsory. 

Arbitration Protocol 

System interface arbitration is implemented using the signals 
ExtRqst* and Release*. When an external agent wishes to submit an 
external request, it asserts ExtRqst*. The processor waits until it is 
ready to handle an external request and then assert Release* for one 
cycle before it tri-states the SysAD bus and SysCmd bus. The external 
agent begins driving the SysAD bus and the SysCmd bus two cycles 
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after a cycle in which Release' is asserted. The external agent always 
deasserts ExtRqst* two cycles after a cycle in which Release* is 
asserted unless the external agent wishes to perform a subsequent 
external request The external agent always releases the SysAD bus 
and the SysCmd bus at the completion of an external request 
The processor will assert Release* for one cycle as a processor read 
request is issued or sometime after a processor read request is issued 
to perform an uncompelled change to slave state. An external agent 
must begin driving the SysAD bus and the SysCmd bus two cycles 
after the cycle in which Release* is asserted. After an uncompelled 
change to slave state, the processor will return to master state at the 
end of the next external request, which may be the read response, or 
may be some other external request 

The processor to system handshake for external requests is illustrated 
in Figure 9-1. 




Figure 9-1 Arbitration Protocol pr External Requests 



Processor Read Request Protocol 



A processor read request is issued, with the system interface in master 
state, by driving a read command on the SysCmd bus and a read 
address on the SysAD bus and asserting ValidOut* for one cycle. 
Only one processor read request may be pending at a time. The 
processor must wait for and retire an external read response before 
initiating a subsequent read. 
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The processor makes an uncompelled change to slave state either at 
the issue cycle of the read request or sometime after the issue cycle of 
the read request by asserting the Release* signal for one cycle. Once in 
slave state, an external agent may return the requested data via a read 
response. An external agent must not assert the signal ExtRqst* for the 
purposes of returning a read response, but rather must wait for the 
uncompelled change to slave state. The signal ExtRqst* may be 
asserted before or during a read response for the purposes of 
performing an external request other than a read response. 
When a read is pending, ExtRqst* is asserted, and Release* is asserted 
for one cycle, it may be unclear if this assertion of Release* is in 
response to ExtRqst*, or represents an uncompelled change to slave 
state. The only situation in which this assertion of Release* may not 
be considered an uncompelled change to slave state is if the system 
interface is operating in secondary-cache mode, the read request was 
a read-with-write-f orthcoming request, and the expected write 
request has not yet been issued by the processor. In this case, the write 
request must be accepted by the external agent before the read 
response can be issued. In all other cases, the assertion of Release* 
may be considered to be an uncompelled change to slave state or to be 
in response to the assertion of ExtRqst*. In this situation, the processor 
will accept either a read response, or any other external request If an 
external request other than a response requests issued, the processor 
will perform another uncompelled change to slave state after 
processing of the external request is complete. 
The response request may either return the requested data, or, if the 
requested data could not be successfully retrieved, an indication that 
the returned data is erroneous. If the returned data was erroneous, it 
will cause the processor to take a bus error. 
A processor read request and an uncompelled change to slave state 
occurring as the read request is issued is illustrated in Figure 9-2. A 
processor read request and the subsequent uncompelled change to 
slave state occurring sometime after the read request is issued is 
illustrated in Figure 9-3. 
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Figure 9-2 Processor Read Request Protocol 
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Figure 9-3 Processor Read Request Protocol, Change to Slave State Delayed 
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Processor Write Request Protocol 
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Processor write requests are issued with one of two protocols. Double 
word, word, and partial word writes use a single word write request 
protocol. Write requests for a block of data use a block write request 
protocol. Processor write requests are issued with the system interface 
in master state. 

A processor single word write request is issued by driving a write 
command on the SysCmd bus and a write address on the SysAD bus 
and asserting ValidOut* for one cycle. This is followed by driving a 
data identifier on the SysCmd bus and data on the SysAD bus and 
asserting ValidOut* for one cycle. The data identifier associated with 
the data cycle must contain a last data cycle indication. 
A processor coherent or noncoherent block write request is issued by 
driving a write command on the SysCmd bus and a write address on 
the SysAD bus and asserting ValidOut* for one cycle. This is followed 
by driving a data identifier on the SysCmd bus and data on the 
SysAD bus and asserting ValidOut* for a number of cycles sufficient 
to transmit the block of data. The data identifier associated with the 
last data cycle must contain a coherent or noncoherent last data cycle 
indication. The first data cycle may not immediately follow the 
address cycle. A processor noncoherent single word write request is 
illustrated in Figure 9-4. A processor coherent block request for eight 
words of data is illustrated in Figure 9-5 and Figure 9-6. 
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Figure 9-4 Processor Noncoherent Single Word Write Request Protocol 
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Figure 9-5 Processor Coherent Block Write Request Protocol 
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Figure C-6 Processor Coherent Block Write Request Protocol 
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Processor Invalidate and Update Request Protocol 

A processor invalidate request or update request will use the same 
protocol as a coherent word write request except that the command 
associated with the address cycle will indicate that this is an update 
request The single data cycle will be unused for an invalidate. 

Processor Null Write Request Protocol 

A processor null write request is issued with the system interface in 
master state by driving a null command on the SysCmd bus and 
asserting ValidOut* for one cycle. The SysAD bus is unused during 
the address cycle associated with a null write request Processor null 
write requests cannot be flow controlled with either RdRdy* or 
WrRdy*, but rather always issue with a single address cycle. A 
processor null write request is illustrated in Figure 9-7. 
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Figure 9-7 Processor Null Write Request Protocol 



Processor Cluster Protocol 



In secondary-cache mode, the processor issues requests both 
individually, as in no-secondary-cache mode, and in groups that 
begin with a processor read request called clusters. A cluster consists 
of a processor read request followed by one or two additional 
processor requests issued while the read request is pending. All of the 
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requests that are part of a cluster must be accepted before the response 
to the read request that begins the cluster may be returned to the 
processor. A cluster includes a processor read request followed by a 
write request, or a processor read request followed by a potential 
update request, or a processor read request followed by a potential 
update request, followed by a write request 
The protocol of each of the requests that form a cluster is as described 
above. The number of unused cycles between the requests that form a 
cluster may be zero or greater. The processor makes an uncompelled 
change to slave state either during or following the last cycle of the last 
request in the cluster. A cluster consisting of a read request, followed 
by an update request, followed by a coherent block write request for 
eight words of data with minimum spacing between the requests that 
form the cluster, and an uncompelled change to slave state at the 
earliest opportunity is illustrated in Figure 9-8. 
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Figure 9-8 Processor Cluster Protocol 



External Request Protocol 



External requests may only be issued with the system interface in 
slave state. An external agent must assert ExtRqst* to arbitrate for the 
system interface, and wait for the processor to release the system 
interface to slave state before issuing an external request If the system 
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interface is already in slave state; i.e., me processor has previously 
performed an uncompelled change to slave state, an external agent 
may begin an external request immediately. 
After issuing an external request, an external agent must return the 
system interface to master state. If the external agent does not have 
any additional external requests to perform, ExtRqst* must be 
deasserted two cycles after the cycle in which Release* is asserted. An 
external agent may hold ExtRqst* asserted if it needs to issue a string 
of external requests, but it must wait for the processor to assert 
Release* and return the system interface to slave state before it 
proceeds with the next external request For a string of external 
requests, the external agent must de-assert ExtRqst* two cycles after 
the cycle in which Release* is asserted for the last external request in 
the string. The processor will continue to handle external requesls as 
long as ExtRqst* is held asserted; however, the processor will not 
release the system interface to slave state for a subsequent external 
request until it has completed the current request A string of external 
requests will not be interrupted by a processor request as long as 
ExtRqst* is held asserted throughout the issue of the string of external 
requests. 

External Read Request Protocol 

External reads are requests for a word of data from some processor 
internal resource. External read requests use a non-split protocol that 
does not allow any other request to occur at the system interface 
between the external read request and the read response. The protocol 
of an external read request encompasses the request from an external 
agent and the response from the processor. 

An external read request consists of driving a read request command 
on the SysCmd bus and a read request address on the SysAD bus and 
asserting Validln* for one cycle. After the address and command are 
sent the external agent releases the SysCmd and SysAD buses and 
allows the processor to begin driving them The processor accesses the 
data that is the target of the read and returns the data to the external 
agent The processor accomplishes this by driving a data identifier on 
the SysCmd bus, the response data on the SysAD bus, and asserting 
ValidOut* for one cycle. The data identifier indicates that this is 
response data and that it contains a last data cycle indication. To 
transition the system interface back to master state, the processor 
continues driving the SysCmd and SysAD buses after the read 
response is returned. 
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External read requests are only allowed to read a word of data from 
the processor. The processor response to external read requests for 
any data element other than a word is undefined. 
An external read request with the system interface initially in master 
state is illustrated in Figure 9-9. 
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Figure 9-9 External Read Request, System Interface in Master State. 

NOTE: The R4000 does not contain any resources that are readable 
with an external read request. The R4000 returns a bus error response 
to any external read request. 

External Null Request Protocol 

The processor supports two kinds of external null requests. A system 
interface release external null request is used to return the system 
interface to master state after it has been released to slave state without 
affecting the processor. An scache release external null request is used to 
return ownership of the secondary cache to the processor while the 
system interface remains in slave state for some period of time. This is 
important since any time the processor releases the system interface to 
slave state to accept an external request, it also acquires ownership of 
the secondary cache for use by the external request in anticipation of 
handling a coherence request. When an external agent requests 
ownership of the system interface for the purposes of using the 
SysAD bus for a transfer unrelated to the processor (such as DMA), 
this ownership of the secondary cache prevents the processor from 
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satisfying subsequent primary cache misses. The scache release 
external request can be issued by the external agent to return 
ownership of the secondary cache to the processor. External null 
requests require no action from the processor other than to return the 
system interface to master state or to regain ownership of the 
secondary cache. 

An external null request consists of driving a null request command 
on the SysCmd bus and asserting Validln* for one cycle. The SysAD 
bus is unused (does not contain valid data) during the address cycle 
associated with an external null request After the address cycle is 
issued the null request is complete. For a system interface release 
external null request, the external agent releases the SysCmd and 
SysAD buses and allows the system interface to return to master state. 
For an scache release external null request, the system interface 
remains in slave state. An scache release external null request with the 
system interface initially in master state is illustrated in Figure 9-10. A 
system interface release external null request with the system interface 
initially in slave state is illustrated in Figure 9-11. 
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Figure 9-10 Secondary Cache Release External Null Request 
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figure 9-11 System Interface Release External Ntdl Request 

External Write Request Protocol 

External write requests use a protocol identical to the processor single 
word write protocol except that the signal Validln* is asserted instead 
o£the signal ValidOut*. An external write request consists of driving 
a write command on the SysCmd bus and a write address on the 
SysAD bus and asserting Validln* for one cycle. This is followed by 
driving a data identifier on the SysCmd bus and data on the SysAD 
bus and asserting Validln* for one cycle. The data identifier 
associated with the data cycle must contain a coherent or noncoherent 
last data cycle indication. After the data cycle is issued, the write 
request is complete and the external agent releases the SysCmd and 
SysAD buses and allows the system interface to return to master state. 
External write requests are only allowed to write a word of data to the 
processor. The behavior of the processor in response to an external 
write request for any data element other than a word is undefined. 
An external write request with the system interface initially in master 
state is illustrated in Figure 9-12. 
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Figure 9-12 External Write Request 

NOTE The only writable resources in the R4000 are the processor in- 
terrupts. 

External Invalidate and Update Request Protocol 

External invalidate and update requests use a protocol identical to 
that for external write requests. The data element provided with an 
update request may be a double word, word, or partial word. The 
single data cycle will be unused (does not contain valid data) for an 
invalidate request An external invalidate request following an 
uncompelled change to slave state is illustrated in Figure 9-13. 
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Figure 9-13 External Invalidate Request following an Uncompelled 
Change to Slave State 



Read Response Protocol 



An external agent must return data to the processor in response to a 
processor read request It does this by first waiting for the processor to 
perform an uncompelled change to slave state and then returning the 
data via a single data cycle or a series of data cycles sufficient to 
transmit the requested data. After the last data cycle is issued, the read 
response is complete and the external agent will release the SysCmd 
and SysAD busses and allow the system interface to return to master 
state. Note that the processor will always perform an uncompelled 
change to slave state at some time after issuing a read request. 
The data identifier for the data cycles must indicate that this is 
response data, and the data identifier associated with the last data 

-^ _. ..o* contain a last data cycle indication. For read responses to 

coherent block read requests, each data identifier must include an 
indication of the cache state in which to load the response data. The 
cache state provided with each data identifier must be the same and 
must be either clean exclusive, dirty exclusive, shared, or dirty shared. 
The behavior of the processor if the cache state provided with the data 
identifiers is changed during the transfer of the block of data, or if the 
cache state provided is invalid is undefined. 
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The data identifier associated with a data cycle may indicate that the 
data transmitted during that cycle is erroneous; however, an external 
agent must return a block of data of the correct size regardless of 
erroneous data cycles. If a read response includes one or more 
erroneous data cycles, the processor takes a bus error. 
Read response data must only be delivered to the processor when a 
processor read request is pending; that is, in response to a processor 
read request The behavior of the processor is undefined when a read 
response is presented to it and there is no processor read pending. 
Further, if the processor issues a read-with-write-forthcoming 
request, a processor write request or a processor null write request 
must be accepted before the read response may be returned. The 
behavior of the processor is undefined if the read response is returned 
before a processor write request is accepted. 
A processor word read request followed by a word read response is 
illustrated in Figure 9-14 A read response for a processor block read 
with the system interface already in slave state is illustrated in Figure 
9-15. 
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Figure 9-14 Processor Word Read Request followed by a Word Read Response 
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Figure 9-15 Block Read Response, System Interface already in Slave State 

External Intervention Request Protocol 

External intervention requests use a protocol similar to that for 
external read requests except that a cache line size block of data may 
be returned along with an indication of the cache state for the cache 
line. The cache state indication depends upon the state of the cache 
line and the value of the data return bit in the intervention request 
command. 

The data return bit in the intervention request command may indicate 
return on dirty or return on exclusive. If the data return bit indicates 
return on dirty, and the cache line that is the target of the intervention 
request is in the state dirty exclusive or dirty shared, the contents of 
the cache line will be returned in response to the intervention request 
If the data return bit indicates return on exclusive, and the cache line 
that is the target of the intervention request is in the state dean 
exclusive or dirty exclusive, the contents of the cache line wiii be 
returned in response to the intervention request Otherwise, the 
response to the intervention request will not include the contents of 
the cache line, but will simply indicate the state of the cache line that 
is the target of the intervention request. Note that if the cache line that 
is the target of the intervention request is not present in the cache at 
all; that is, a tag comparison tor the cache line at the target cache 
address fails, the cache line that is the target of the intervention 
request will be considered to be in the invalid state. 
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The processor will return an indication of the cache state in which a 
cache line was found but not its contents by driving a coherent data 
identifier that indicates the state of the cache line on the SysCmd bus 
and asserting ValidOut* for one cycle. The SysAD bus is unused 
during this data cycle. The data identifier will indicate that this is a 
response data cycle and will contain a last data cycle indication. 
The processor will return the contents of a cache line along with an 
indication of the cache state in which it was found by issuing a 
sequence of data cycles sufficient to transmit the contents of the cache 
line. The data identifier transmitted with each data cycle indicates the 
cache state in which the cache line was found and that this is response 
data. The data identifier associated with the last data cycle will contain 
a last data cycle indication. 

If the contents of a cache line are returned in response to an 
intervention request, it will be returned in sub-block order starting 
with the double word at the address supplied with the intervention 
request For further details on sub-block ordering see Appendix D. 
Note, however, that if the intervention address targets the double 
word at the beginning of the block, sub-block ordering is equivalent to 
sequential ordering. 

An external intervention request to a cache line found in the shared 
state with the system interface initially in master state is illustrated in 
Figure 9-16. An external intervention request to a cache line found in 
the dirty exclusive state with the system interface initially in slave 
state is illustrated in Figure 9-17. 



R4000 User's Manual-Preliminary 9 ' 35 



Chapter 9 



|| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 




Figure 9-16 External intervention Request, Shared line, System Interface 
in Master State 
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Figure 9-17 External Intervention Request, Dirty Exclusive line, 
System Interface in Slave State 
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External Snoop Request Protocol 



External snoop requests use a protocol identical to that for external 
read requests, except that, instead of returning data, the processor will 
respond to a snoop request with an indication of the current cache 
state for the cache line that is the target of the snoop request. The 
processor accomplishes this by driving a coherent data identifier on 
the SysCmd bus, and asserting ValidOut* for one cycle. The SysAD 
bus is unused during the snoop response. The processor will continue 
driving the SysCmd and SysAD busses after the snoop response is 
returned to transition the system interface back to master state. 
Note that if the cache line that is the target of the snoop request is not 
present in the cache at all; that is, a tag comparison for the cache line 
at the target cache address fails, the cache line that is the target of the 
snoop request will be considered to be in the invalid state. 
An external snoop request submitted with the system interface in 
master state is illustrated in Figure 9-18. An external snoop request 
submitted with the system interface in slave state is illustrated in 
Figure 9-19. 
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Figure 9-18 External Snoop Request, System Interfax in Master State 
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Figure 9-19 External Snoop Request, System Interface in Slave State 



Processor Request and Cluster Flow Control 



The signal RdRdy* may be used by an external agent to control the 
flow of a processor read, invalidate, or update request or a processor 
read request followed by a potential update request within a cluster. 
The processor samples the signal RdRdy* to determine if the external 
agent is currently capable of accepting a read, invalidate, or update 
request, or a read request followed by a potential update request The 
signal WrRdy* controls the flow of a processor write request. The 
processor will not complete the issue of a read, invalidate, or update 
request or a read request followed by a potential update request until 
it issues an address cycle for the request for which the signal RdRdy* 
was asserted two cycles previously. The processor will not complete 
the issue of a write request until it issues an address cycle for the write 
request for which the signal WrRdy* was asserted two cycles 
previously. 

Two processor write requests in which the issue of the second is 
delayed for the assertion of WrRdy* are illustrated in Figure 9-20. A 
processor cluster in which the issue of the read and a potential update 
request are delayed for the assertion of RdRdy* is illustrated in Figure 
9-21. A processor cluster in which the issue of the write request is 
delayed for the assertion of WrRdy* is illustrated in Figure 9-22. The 
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issue of a processor write request delayed for the assertion of WrRdy* 
and the completion of an external invalidate request is illustrated in 
Figure 9-23. 
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Figure 9-20 Tvn Processor Write Bequests, Second Write Delayed for 
the Assertion oj WrRdy* 
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Figure 9-21 Processor Head Request Within a Ouster Delayed for the 
Assertion of RdRdy* 
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Figure 9-22 Processor Write Request Within a Ouster Delayed for the 
Assertion o) WrRdy* 
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Figure 9-23 Processor Write Request Delayed for the Assertion of 

WrRdy* and the Completion of an External Invalidate. Request 
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Data Rate Control 



The system interface supports a maximum data rate of one double 
word per cycle. The maximum data rate the processor can support is 
directly related to the secondary cache access time, if the access time is 
too long, the processor will not be able to transmit and accept data at 
the maximum rate. 

The rate at which data is delivered to the processor may be chosen by 
an external agent by driving data and asserting Validln* every n cycles 
instead of every cycle. The processor will only interpret cycles as valid 
data cycles during which Validln* is asserted and the SysCmd bus 
contains a data identifier. The processor will continue to accept data 
until the data word tagged as the last data word is received. An 
external agent may deliver data at any rate it chooses but must not 
deliver data to the processor faster than it is capable of accepting it 
Because the secondary cache is organized as a 128-bit RAM array, the 
processor will operate most efficiently if data is delivered to it in pairs 
of double words. It is most efficient to reduce the data rate by 
delivering a pair of double words to the processor, followed by some 
number of unused cycles, followed by another pair of double words. 
The pattern should be chosen to repeat at a rate determined by the 
secondary cache write cycle time. However, the processor will accept 
data in any pattern as long as the time between the transfer of any pair 
of odd-numbered double words is greater than or equal to the write 
cycle time of the secondary cache. Double words in the transfer 
pattern are numbered beginning at zero such that the odd numbered 
words are the second, fourth, sixth, and so on words transferred. 
The maximum processor data rate for each of the possible secondary 
cache write cycle times and the most efficient data pattern for each 
data rate is illustrated in Table 9-1. In this and subsequent tables data 
patterns are specified using the letters "D" and "x", "D" indicates a data 
cycle and "x" indicates an unused cycle. A data pattern is specified as 
a sequence of letters, indicating a sequence of data and unused cycles 
that will be repeated to provide the appropriate data rate. For 
example, a data pattern specified by the sequence of letters "DDxx", to 
achieve a data rate of two words every four cycles, is a data pattern in 
which two data cycles are followed by two unused cycles followed by 
two data cycles and two unused cycles, and so on. A read response in 
which data is provided to the processor at a rate of two words every 
three cycles using the data pattern "DDx" is shown in Figure 9-24. 
If data is delivered to the processor at a rate that exceeds the maximum 
the processor can support, based on the secondary cache write cycle 
time, the behavior of the processor is undefined. The secondary cache 



R4000 User's Manual-Preliminary 9 ' 4i 



Chapter 9 



write cycle time is the sum of the parameters TwriDiy T Wr5Ujv aad 
T WrRc described in the section on secondary cache write cycles. The 
rate at which the processor transmits data is programmable at boot 
time via the boot-time mode control interface. The transmit data rate 
may be programmed to any of the data rates and data patterns listed 
in Table 9-2, as long as the programmed data rate does not exceed the 
maximum the processor can support, based on the secondary cache 
access time. If a transmit data rate is programmed that exceeds the 
maximum the processor can support, the behavior of the processor is 
undefined. A processor write request for which the processor transmit 
data rate has been programmed to one double word every two cycles 
using the data pattern "DDxx" is shown in Figure 9-25. 
Table 9-1 and Table 9-2 show the maximum processor and transmit 
data rates that can be achieved for a given set of SCache parameters, 
based on a PClock-to-SClock divisor of 2. To find the maximum 
allowable SCache write cycle time and SCache access time, multiply 
the maximum SCache numbers for each pattern by: 
(PClock to SClock Divisor)/2 

The minimum number for these parameters will always be the 
minimum access time supported by R4000. 
Table 9-1 Maximum Processor Data Rates 



SCache Write Cycle 


Time 


Max Date Rate 

1 Double/1 SClock Cycle 


Best Data Pattern 


1-4 PCycles 




D 


5-6 PCycles 




2 Doubles/3 SClock Cycles 


DDx 


7-8 PCycles 




1 Double/2. SClock Cycles 


DDxx 


9-10 PCycles 




2 Doubles/5 SClock Cycles 


DDxxx 


11 -12 PCycles 




1 Double/3 SClock Cycles 


DDxxxx 
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Figure 9-24 Read Response, Reduced Data Rate, System Interface in Slave State 
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Table 9-2 Transmit Data Bates 



Data Rate 

1 Double/1 SCIock Cycle 

2 Doubles/3 SCIock Cycles 
1 Double/2 SCIock Cycles 

1 Double/2 SCIock Cycles 

2 Doubles/5 SCIock Cycles 
1 Double/3 SCIock Cycles 
1 Double/3 SCIock Cycles 
1 Double/4 SCIock Cycles 
1 Double/4 SCIock Cycles 



Data Pattern 
D 


Max SCache Access 


4 PCycles 


DDx 


6 PCycles 


DDxx 


8 PCycles 


DxDx 


8 PCycles 


DDxxx 


10 PCycles 


DDxxxx 


12 PCycles 


DxxDxx 


12 PCycles 


DDxxxxxx 


16 PCycles 


DxxxDxxx 


16 PCycles 



10 11 



12 
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Multiple Drivers on the SysAD Bus 

Inmost applications the SysAD bus will be a point to point connection 
from the processor to a bidirectional, registered, transceiver in an 
external agent For those applications, the SysAD bus has only two 
possible drivers, the processor and the external agent However, 
certain applications may wish to add additional drivers and receivers 
to the SysAD bus, and allow transmissions to take place over the 
SysAD bus that the processor is not involved in. To accomplish this, 
the external agent must coordinate the usage of the SysAD bus using 
the arbitration handshake signals and the external null requests. 
To implement an independent transmission on the SysAD bus that 
does not involve the processor, the external agent must request the 
SysAD bus to issue an external request If the processor is being used 
with a secondary cache, and after the processor releases the system 
interface to slave state, the external agent should issue a scache release 
external null request to return ownership of the secondary cache to the 
processor. The external agent can then allow the independent 
transmission to take place on the SysAD bus making sure that 
Validln* is not asserted while the transmission is occurring. When the 
transmission is complete, the external agent can issue a system 
interface release external null request to return the system interface to 
master state. 

System Interface Endianness 

The endianness of the system interface is programmed at boot time 
through the boot time mode control interface, and remains fixed until 
the next time the processor mode bits are read. The endianness of the 
system interface and the external system cannot be changed by 
software; the reverse endian bit can be set by software to reverse the 
interpretation of endianess inside the processor, but the endianess of 
the system interface remains unchanged. 



Cycle Counts for System Interface Interactions 

To facilitate system design, the processor specifies minimum and 
maximum cycle counts for various processor transactions and for the 
processor's response time to external requests. Processor requests 
themselves are constrained by the system interface request protocol, 
and the cycle counts for such requests can be determined by 
examining the protocol The spacing between requests within a 
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duster, the waiting period for the processor to release the system 
interface to slave state in response to an external request, and the 
response time for an external request that requires a response is 
variable and subject to minimum and maximum cycle counts. The 
remainder of this section will describe and tabulate the minimum and 
maximum cycle counts for these system interface interactions. 
The minimum and maximum number of unused cycles between the 
requests within a cluster is a function of processor internal activity. 
The minimum number of unused cycles separating requests within a 
cluster is zero: the requests may be adjacent The maximum number of 
unused cycles separating requests within a cluster varies depending 
on the requests that form the cluster. The minimum and maximum 
number of unused cycles separating requests within a cluster is 
summarized in Table 9-3. 
Table 9-3 Unused Cycles Separating Requests Within a Cluster 



From Processor 


To Processor 


Minimum Unused 


Maximum Unused 


Request 


Request 


$ploek Cvcles 


SClock Cvcles 


Read 


Invalidate or 
Update 





2 


Read 


Write 





2 


Invalidate or 


Write 





2 


Update 









The number of cycles the processor may wait to release the system 
interface to slave state for an external request is referred to as the 
release latency. The release latency is a function of processor internal 
activity and processor request activity. The processor will release the 
system interface to accept an external request under the conditions 
described above. When no processor requests are in progress, internal 
h-^hr a.,r.h ac rp^iima tfw» rmmarv cache from the secondary cache, 

OVUVllV/ JWWltW »v*"."»Q >*"• J J » 

may cause the processor to wait some number of cycles before 
releasing the system interface. Release latency is defined as the 
number of cycles ExtRqst* is asserted for before the signal Release* is 
asserted. Release latency is considered in three categories: 

1. Release latency when the external request signal is asserted 
during the cycle two cycles before thelast cycle of a processor 
request or two cycles before the last cycle of the last request in 
a cluster. 
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2. Release latency when the external request signal is not assert- 
ed during a processor request or cluster, or asserts during the 
last cycle of a processor request or cluster. 

3. Release latency when the processor does an uncompelled 
change to slave state. 

The minimum and maximum release latencies for requests that fall 
into categories (1), (2), and (3) above are summarized in Table 9-4. 
Table 9-4 Release Latency (in Pcycles) for Category (1), (2), and (3) 
External Bequests 



Cateqorv 


Minimum * 


Maximum* 


(D 


4 


6 


(2) 


4 


24 


(3) 





TBD 


•These cycle 


counts are 


preliminary and are 


subject to change. 





The number of cycles the processor may take to respond to an external 
request that requires a response, that is, an external intervention 
request, read request, or snoop request, will be referred to as the 
intervention response latency, external read response latency, or snoop 
response latency respectively. The number of cycles of latency is the 
number of unused cycles between the address cycle of the request and 
the first data cycle of the response. Intervention response latency and 
snoop response latency are a function of processor internal activity 
and secondary cache access time. The minimum and maximum 
intervention response latency and snoop response latency, as a 
function of secondary cache access time, is summarized in Table 9-5. 
External read response latency is purely a function of processor 
internal activity. The minimum and maximum external read response 
latency is summarized in Table 9-6. 
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Table 9-5 Intervention Response latency and Snoop Response latency (in Pc ycles) 

Snoop response 
latency* 



Max SCache 


Intervention response 


AfiGSSS 


latency* 




Min 


Max 


1-4 PCycles 


6 


26 


5-6 PCycles 


8 


28 


7-8 PCycles 


10 


30 


9-10 PCycles 


12 


32 


11-12 PCycles 


14 


34 



MM 


Max 


6 


26 


8 


28 


10 


30 


12 


32 


14 


34 



These cycle counts are preliminary, and are subject to change. 



Table 9-6 External Bead Response Latency (in Pcycks) 



Min* Max* 
External Read Response Latency 4 4 
*These cycle counts are preliminary, and are subject to change. 



System Interface Syntax 

System interface commands specify the precise nature and attributes 
of any system interface request during the address cycle for the 
request System interface data identifiers specify the attributes of a 
data element transmitted during a system interface data cycle. The 
f ollowings sections describe the syntax, that is, the bitwise encoding, 
of system interface commands and data identifiers. 

For system interface commands and data identifiers associated with 
external requests, reserved bits and reserved fields in the command or 
data identifier should be deasserted— that is set to one (1) or all ones 
respectively. For system interface commands and data identifiers 
associated with processor requests, reserved bits and reserved fields in 
the command and data identifier are undefined. 

System Interface Command and Data Identifier Syntax 

System interface commands and data identifiers are encoded in nine 
bits, and are transmitted from the processor to an external agent or 
from an external agent to the processor on the SysCmd bus during 
address and data cycles. Bit eight (most-significant bit) of the SysCmd 
bus determines whether the current content of the SysCmd bus is a 
command or a data identifier and, therefore, whether the current cycle 
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is an address cycle or a data cycle. For system interface commands 
SysCmd(8) must be set to 0. For system interface data identifiers 
SysCmd(8) must be set to 1 . 

System Interface Command Syntax 

This section defines the encoding of the SysCmd bus for system 
interface commands. A common encoding is used for all system 
interface commands. SysCmd(8) must be set to for all system 
interface commands. 

For all system interface commands SysCmd(7:5) specify the system 
interface request type which may be read, write, null, invalidate, 
update, intervention, or snoop. The encoding of SysCmd(7:5) for 
system interface commands is illustrated in Table 9-7. 
Table 9-7 Encoding of SysCmd(7£) for System Interface Commands 



SysCmdffjg) 


Command 





Read Request 


1 


Read-With-Wrrte-Forthcoming Request 


2 


Write Request 


3 


Null Request 


4 


Invalidate Request 


5 


Update Request 


6 


Intervention Request 


7 


Snoop Request 



For read requests, the remainder of the SysCmd bus specifies the 
attributes of the read. SysCmd(4:3) encode block, coherency, and 
exclusivity attributes for the read. A read request with a write request 
forthcoming cannot be a double word, word, or partial word read. For 
coherent and noncoherent block reads SysCmd(2) specifies whether 
the address of the cache line being replaced by this read request is 
being retained in the link address register and SysCmd(l:0) encodes 
the block size for the read. For double word, word, or partial word 
reads, SysCmd(2:0) encodes the size of the read data in bytes. The 
encodings of SysCmd(4:3) for read commands are shown below in 
Table 9-8. The encodings of SysCmd(2:0) for coherent and 
noncoherent block reads and double word, word, or partial word 
reads is shown in Table 9-9 and Table 9-10, respectively. 
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Table 9-8 Encoding of SysCmd(43) for Read Requests 



SysCmd&3i Read attributes 

Coherent block read 

1 Coherent block read, exclusivity requested 

2 Noncoherent block read 

3 Double word, single word, or partial word read 



Table 9-9 Encoding of SysCmd(2K» for Coherent and Noncoherent BlockReadRe- 



quests 



SysCmd(21 



Link address retained indication 
Link address not retained 


1 


Link address retained 


SysCmdj&fll 




Read block size 
Four words 


1 

2 


Eight words 
Sixteen words 


3 

... 


Thirty-two words 
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Table 9-10 Encoding ofSysCmd(2ti) fir Double Word, Word, or 
Partial Word Read Requests 



SysCmd j2iffl Read data size 

One byte valid (Byte) 

1 Two bytes valid (Halfword) 

2 Three bytes valid (Tribyte) 

3 Four bytes valid (Word) 

4 Five bytes valid (Quintibyte) 

5 Six bytes valid (Sextibyte) 

6 Seven bytes valid (Septibyte) 

7 Eight bytes valid (Double Word) 



For write requests, the remainder of the SysCmd bus specifies the 
attributes of the write. SysCmd(4:3) encode block attributes for the 
write. For block writes SysCmd(2) specifies whether the cache line 
associated with the write request will be replaced or retained after the 
write is completed and SysCmd(lH)) encode the block size for the 
write. For double word, word, or partial word writes SysCxnd(2:0) 
encode the size of the write data in bytes. The encodings of 
SysCmd(4:3) for write commands are shown below in Table 9-11. The 
encodings of SysCmd(2:0) for block writes or double word, word, or 
partial word writes are shown in Table 9-12 and Table 9-13, 
respectively. 



R4000 User's Manual-Preliminary 



9-51 



Chapter 9 



Table 9-11 Encoding of SysCmd(4-3) for Write Bequests 



SysCmd&21 Write attributes 

Reserved 

1 Reserved 

2 Block write 

3 Doubleword, word, or partial word write. 



Table 9-12 Encoding of SysCmdQiO) for Block Write Bequests 



SysCmd(21 



1 
SysCmdiiifil 



1 

2 

3 



Cache line replacement attributes 

Cache line replaced 

Cache line retained 

Write block size 

Four words 

Eight words 

Sixteen words 

Thirty-two words 



Table 9-13 Encoding of SysCmd(2&) for Doubleword, Word, or 
Partial Word Write Requests 



SysCmd{2i0 Write data size 

One byte valid (Byte) 

1 Two bytes valid (Halfword) 

2 Three bytes valid (Tribyte) 

3 Four bytes valid (Word) 

4 Five bytes valid (Quintibyte) 

5 Six bytes valid (Sextibyte) 

6 Seven bytes valid (Septibyte) 

7 Eight bytes valid (Doubleword) 



Processor null write requests, system interface release external null 
requests, and scache release external null requests all use the null 
request command. For processor null requests, SysCmd(43) specifies 
that this is a null write request. For external null requests, 
SysCmd(4:3) specifies whether this is a system interface release null 
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request or a scache release null requestThe encodings of SysCmd(4:3) 
for processor null requests are shown in Table 9-14. The encodings of 
SysCmd(4:3) for external null requests are shown in Table 9-15. 
Table 9-14 Encoding of SysCmd(43) for Processor Null Requests 



SysCm d(4;31 Null attributes 

Null write 

1 Reserved 

2 Reserved 

3 Reserved 



Table9-15 Encoding of SysCmd(43) for External Null Requests 



SysCmd&Sl Null attributes 

System interface release 

1 Scache release 

2 Reserved 

3 Reserved 



For invalidate and update requests SysCmd(4) is used by external 
requests to indicate that the external request is in conflict with an 
unacknowledged processor update request, canceling the update. 
SysCmd(4) is reserved for processor update requests. SysCmdO) is 
used by processor requests to specify whether the update is potential 
or compulsory. SysCmdO) is reserved for processor invalidate 
requests. SysCmdO) is used by external update requests to indicate 
whether the update request will change the state of the updated cache 
line to shared, or leave the state of the updated cache line unchanged. 
SysCmd<2K» specifies the size of the data element in bytes for update 
requests. The encodings of SysCmd(4:0) for processor invalidate and 
update requests is shown inTable 9-16. The encodings of SysCmd(4:0) 
for external invalidate and update requests are shown in Table 9-17. 
SysCmd(4:0) is reserved for processor invalidate requests. 
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Table 9-16 Encoding ofSysCmdW) for Processor Update Requests 



SysCmdHl Reserved 

SysCmd&l Update type 

Compulsory 

1 Potential 
SysCmdl^fil Update data size 

One byte valid (Byte) 

1 Two bytes valid (Hatfword) 

2 Three bytes valid (Tribyte) 

3 Four bytes valid (Word) 

4 Five bytes valid (Quintibyte) 

5 Six bytes valid (Sextibyte) 

6 Seven bytes valid (Septibyte) 

7 Eight bytes valid (Doubleword) 



Table 9-17 Encoding of SysCmd(4&) for External Update Requests 



SysCmdHl Processor un acknowledged nnrtate cancellation 

Update cancelled 

1 No cancellation 

SysCmd{31 update cach e state change attributes 

Cache state changed to Shared 

1 No change to cache state 
SysCmdf2j01 Update data size 

One byte valid. (Byte) 

1 Two bytes valid (Halfword) 

<■> Tli»u> Kirtae Molirl /Trih\/ta\ 

£. IIIIOOUJHW »«*■■». y...~y.— , 

3 Four bytes valid (Word) 

4 Five bytes valid (Quintibyte) 

5 Six bytes valid (Sextibyte) 

6 Seven bytes valid (Septibyte) 

7 Eight bytes valid (Doubleword) 



NOTE: SysCmd(3:0) is reserved for External Invalidate Requests 
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For intervention and snoop requests SysCmd(4) is used to indicate 
that this external request is in conflict with an unacknowledged 
processor update request, canceling the update. The processor never 
issues an intervention or snoop request SysCmdO) is the data 
response on dirty bit for intervention requests and is reserved for 
snoop requests. If the data response on dirty bit is asserted, the 
processor returns the contents of the cache line in response to an 
intervention request if the line is found in state dirty exclusive or dirty 
shared. If the data response on dirty bit is deasserted, the processor 
returns the contents of the cache line in response to an intervention 
request if the line is found in state clean exclusive or dirty exclusive. 
For both snoop and intervention requests, SysCmd(2:0) specify a 
cache state change function that is applied to the cache line atomically 
with respect to the intervention or snoop response. 
The encodings of SysCmd(4:0) for intervention requests are shown in 
Table 9-19; the encodings SysCmd(4:0) for snoop requests are shown 
in Table 9-19. 
Table 9-18 Encodings of SysCmd(4.-0) for Intervention Requests 



SysCmd{4} 



1 
SysCmd££l 



1 



Processor unacknowledged upda te cancellation 

Update cancelled 

No cancellation 

Data response on dirty bit 

Return cache line data if in state dirty exclusive or dirty shared 

Return cache line data if in state clean exclusive or dirty exclusive 



SysCmd j&fi) Cache state change function 




1 



No change to cache state 

If cache state is clean exclusive, change to shared, otherwise no 

change to cache state 

If cache state is clean exclusive or shared, change to invalid, 

otherwise no change to cache state 

If cache state is clean exclusive, change to shared or if cache state is 

dirty exclusive, change to dirty shared, otherwise no change to cache 

state 

If cache state is clean exclusive, dirty exclusive, or dirty shared, 

change to shared, otherwise no change to cache state 

Change to invalid regardless of current cache state 

Reserved 

Reserved 
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Table 9-19 Encodings of SysCmd(4mfor Snoop Bequests 



SysCmdg} Processor u n anknowledned Update cancellation 

Update cancelled 

1 No cancellation 
SysCmd{31 Reserved 

SysCm d(2;01 Cache state change function 

No change to cache state 

1 If cache state is clean exclusive, change to shared, otherwise no 
change to cache state 

2 If cache state is dean exclusive or shared, change to invalid, 
otherwise no change to cache state 

3 If cache state is clean exclusive, change to shared or if cache state is 
dirty exclusive, change to dirty shared, otherwise no change to cache 
state 

4 If cache state is dean exdusive, dirty exdusive, or dirty shared, 
change to shared, otherwise no change to cache state 

5 Change to invalid regardless of current cache state 

6 Reserved 

7 Reserved 



System Interface Data Identifier Syntax 

This section defines the encoding of the SysCmd bus for system 
interface data identifiers. A common encoding is used for all system 
interface data identifiers. SysCmd(8) must be set to 1 for all system 
interface data identifiers. System interface data identifiers have two 
formats, one for coherent data and another for noncoherent data: 

• Data associated with processor block write requests and 
processor double word, word, or partial word write 
requests is noncoherent 

• Data associated with processor update requests is 
noncoherent 

• Data returned in response to a processor coherent block 
read request is coherent 
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• Data returned in response to a processor noncoherent block 
read request or a processor double word, word, or partial 
word read request is noncoherent 

• Data associated with external update requests is 
noncoherent. 

• Data associated with external write requests is 
noncoherent. 

• Data returned in response to an external read request is 
noncoherent. 

• Data returned in response to an external intervention 
request is coherent 

For coherent and noncoherent data identifiers, both processor and 
external, SysCmd(7) marks the last data element and SysCmd(6) 
indicates whether or not the data is response data. Response data is 
data returned in response to a read request or an intervention request 
SysCmd(5) is the good data bit and indicates whether or not the data 
element is error free. Erroneous data contains an uncorrectable error. 
Erroneous data returned to the processor will cause a processor bus 
error. The processor will deliver data with the good data bit 
deasserted when a primary parity error is detected for a transmitted 
data item. A secondary cache data ECC error can be detected by 
comparing the values transmitted on the SysAD and SysADC. For 
external data identifiers, both coherent and noncoherent SysCmd(4) 
indicates to the processor whether to check the data and check bits for 
this data element, and SysCmdO) is reserved. For processor data 
identifiers, both coherent and noncoherent, SysCmd(4:3) are reserved. 
For coherent data identifiers SysCmd(2K)) indicate a cache state for the 
data. This indication provides the cache state with which to load the 
cache line for responses to processor coherent read requests. It also 
indicates the cache state in which the line was found for data 
associated with the response to an external intervention request or for 
the data cycle issued in response to an external snoop request For 
noncoherent data identifiers SysCmd(2:0) is reserved. 
The encodings of SysCmd<7:3) for processor data identifiers are 
illustrated in Table 9-20. The encodings of SysCmd(7:3) for external 
data identifiers are illustrated in Table 9-21. The encodings of 
SysCmd(2:0) for coherent data identifiers are illustrated in Table 9-22. 
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Table 9-20 Encoding of SysCmdVS) for Processor Data Identifiers 




SysCmdgl 


L fl st data el^rnen* indication 







Last data element 




1 


Not the last data element 




SysCmd{51 


Response date Indication 







Data is response data 




1 


Data is not response data 




SysCmdiSl 


finnd data indication 







Data is error tree 




1 


Data is erroneous 




SysCmd(&31 


Reserved 




Table 9-21 Encoding ofSysCntd(73) for External Data Identifiers 




SysCmd(7) 


Last data element indication 







Last data element 




1 


Not the last data element 




SysCmdI6J. 


Response date indication 







Data is response data 




1 


Data is not response data 




SysCmd{51 


Good data indication 







Data is error free 




1 


Data is erroneous 




SysCmdHl 


Data checkinq enable 







Check the data and check bits 




1 


Don't check the data and check bits 




SysCmdisi 


Reserved 
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Table 9-22 Encoding of SysCmdQiO) for Coherent Data Identifiers 

SysCmdi&Ql Cache state 

Invalid 

1 Reserved 

2 Reserved 

3 Reserved 

4 Clean Exclusive 

5 Dirty Exclusive 

6 Shared 

7 Dirty Shared 



System Interface Addresses 



System interface addresses are full 36-bit physical addresses 
presented on the least-significant 36 bits (bits 35 through 0) of the 
SysAD bus during address cycles. The remaining bits of the SysAD 
bus are unused during address cycles. Addresses associated with 
double word, word, or partial word transactions, i.e. double word, 
word, or partial word read and write requests and update requests, 
are aligned for the size of the data element Specifically; for double 
word requests, the low-order three bits of the address are zero; for 
word requests, the low-order two bits of the address are zero; and for 
half-word requests, the low-order bit of the address is zero. For byte, 
tri-byte, quinti-byte, sexti-byte and septi-byte requests the address 
provided is a byte address. 

Addresses associated with block requests are aligned to double-word 
boundaries; that is, the low-order three bits of the address are zero. 
The order in which data is returned in response to a processor block 
read request can be programmed via the boot-time mode control 
interface to sequential ordering or sub-block ordering. If sequential 
ordering is enabled, the processor always delivers the address of the 
double word at the beginning of the block on a block read request An 
external agent must return the block of data sequentially starting at 
the beginning of the block. If sub-block ordering is enabled, the 
processor delivers the address of the double word within the block 
that it wants returned first An external agent must return the block of 
data using sub-block ordering starting with the addressed double 
word. For further details on sub-block ordering see Appendix D. Only 
an R4000 in the R4000SC and R4000MC configuration with a 
secondary cache may be programmed to use sequential ordering. 
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For block write requests, the processor always delivers the double 
word address of the double word at the beginning of the block, and 
delivers data beginning with the double word at the beginning of the 
block and progressing sequentially through the double words that 
form the block. 

During data cycles, the driven byte lines depend upon the position of 
the data with respect to the aligned double word containing the data 
(this may be a byte, halfword, tri-byte, word, quinti-byte, sexti-byte, 
septi-byte, or a double word.). For example, on a byte request whose 
address modulo 8 is 0, SysAD 7...0 get driven during the data cycles. 
Please refer to Figure 22 



Processor Internal Address Map 



External reads and writes to the processor are provided to access 
processor internal resources that may be of interest to an external 
agent However, the R4000 does not contain any resources that are 
readable with an external read request. The R4000 will return a bus 
error response to any external read request The only writable 
resource in this version of the R4000 are the processor interrupts. 
The processor decodes bits 6:4 of the address associated with an 
external read or write request to determine which processor internal 
resource is the target of the request. The only processor internal 
resource available for access by an external request is the interrupt 
resource, and it is only accessible via an external write request The 
interrupt resource is accessed via an external write request with an 
address of 000 on bits 6:4 of the SysAD bus. See the section on 
interrupts for further details on external writes to the interrupt 
resource. 
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This chapter describes the clocks used in the R4000 and the processor 
status reporting mechanism. The topics covered here include: 

• Basic System Clocks 

• Interfacing to a Phase-Locked System 

• Interfacing to a System without Phase Locking 

• Processor Status Outputs 

Basic System Clocks 

Each clock in the R4000 is explained below. 

MasterClock 

The processor bases all internal and external clocking on the single 
clock input MasterClock. The processor generates the clock output 
MasterOut at the same frequency as MasterClock and aligns 
MasterOut with MasterClock, if Syncln is shorted to SyncOut 
MasterOut is provided for use in clocking external logic that must 
cycle at MasterClock frequency, such as the reset logic, and the 
processor aligns MasterOut with SyncOut 

Syncln/SyncOut 

The processor generates the clock output SyncOut at the same 
frequency as MasterClock and aligns Syncln with MasterClock. 
SyncOut must be connected to the clock input Syncln so that the 
processor can compensate for output driver delays and input buffer 
delays in aligning Syncln with MasterClock. 
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PCIock 



SCIock 



TCIock 



RCIock 



The processor generates the internal clock PCIock at twice the 
frequency of MasteiClock and precisely aligns every other rising edge 
of PCIock with the rising edge of MasteiClock. PCIock is used by all 
internal registers and latches. 



The processor divides PCIock by 2, 3, or 4 (programmed via the 
initialization control interface) to generate the internal clock SCIock. 
SCIock is used by the processor to sample data at the system interface 
and to clock data into the processor's system interface output 
registers. The rising and falling edges of SCIock are aligned with the 
rising edges of PCIock. 



The processor generates TCIock at the same frequency as SCIock. 
TCIock is a transmit clock that can be used by an external agent to 
clock its output registers and as the global system clock for the logic 
mat makes up the external agent TCIock is identical to SCIock, and 
the edges of TCIock are precisely aligned with the edges of SCIock. 
TCIock is used by external agent circuitry. TCIock is aligned with 
MasterClock if Synln is shorted to SyncOut 
Figure 10-1 shows the clocks for a PClock-to-SClock division of two. 
Figure 10-2 shows the clocks for a PClock-to-SClock division of four. 



The processor generates RCIock at the same frequency as SCIock. 
RCIock is a receive clock that can be used by an external agent to clock 
its input registers. RCIock is skewed with respect to TCIock and 
SCIock so that it leads TCIock by 25% of the SCIock cycle time. RCIock 
is used by external agent circuitry. 
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Figure 10-1 Processor Clocks, PClock to SClockDwisor of 2 
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Figure 10-2 Proc*isorClodci,PClocktoSClockDhtisorof4 



System Timing Parameters 



Data provided to the processor must be stable a minimum of tos 
nanoseconds (ns) before the rising edge of SClock and held valid for a 
minimum of t DH ns after the rising edge of SClock. This setup and 

hold time is required 101 uaia vj piu^gaic uuuu 6 .. •*» - 1 

input buffers and meet the setup and hold time requirements of the 

processor's input latches. 

Data provided by the processor becomes stable a minimum of to M ns 

after the rising edge of SClock and a maximum of too ns after the 

rising edge of SClock. This drive-off time is the sum of the maximum 

delay through the processor's output drivers and the maximum clock 

to Q delay of the processor's output registers. 

Certain processor inputs (specifically VCCOk, ColdResef, and 

Reset*) are sampled based on MasterClock, while certain processor 

outputs (specifically Status(7:0)) are driven outbased on MasterClock. 
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The same setup, hold, and drive-off parameters, t^ k>H> *DM' and 
tpo, apply to these inputs and outputs, but they are with respect to 
MasterClock instead of SClock. 

The values of t^, t DH , toy, and too for the R4000 processor are 
tabulated in AC Characteristics. 

The alignment of SyncOut, PClock, SClock, TClock, and RClock is 
accomplished by the processor with internal Phase Locked Loop 
(PLL) circuits that generate aligned clocks based on SyncOut/Syndn. 
PLL circuits by their nature are only capable of generating aligned 
clocks for MasterClock frequencies within a limited range. Minimum 
and maximum frequencies for MasterClock for various speed ratings 
of the R4000 processor are tabulated in AC Characteristics. 
Clocks generated using PLL circuits contain some inherent 
inaccuracy, or jitter, in their alignment with respect to the 
MasterClock. That is, a clock aligned with MasterClock by the 
processor's PLL circuits may lead or trail MasterClock by an amount 
as large as me related maximum jitter. Maximum jitter for the clocks 
generated by various speed ratings of the R4000 processor is tabulated 
in AC Characteristics. 

Clock Interfacing to a Phase-Locked System 

When the processor is used in a phase-locked system, the components 
of the external agent must phase lock their operation to a common 
MasterClock. In such a system, the delivery of data and the sampling 
of data has common characteristics for all components, even if the 
components have different delay values. The transmission time (the 
amount of time a signal has to propagate along the trace from one 
component to another) between any two components A and B of a 
phase locked system can be calculated from the following equation: 

Transmission Time = (SClock period) - %o for A) - (t ds for B) 

- (Clock Jitter for A Max) - (Clock Jitter for B Max) 

A block-level diagram of a phase-locked system employing the R4000 
processor is shown in Figure 10-3. 
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Figure 10-3 Phase-Locked System Employing toe R4000 Processor 

Clock Interfacing to a System Without Phase-Lock 

When the processor is vised in a system in which the external agent 
cannot phase lock to a common MasterClock, the output clocks 
RClock and TClock may be used to clock the remainder of the system 
Two clocking methodologies are shown below: one for interfacing to 
^fo-ornv H«HrP«. and one for interfacing to discrete CMOS logic 
devices. 

Interface to Gate-Array System 

When interfacing to a gate-array system, both RClock and TClock are 
used for clocking within the gate-arrays. The gate array buffers 
RClock internally and uses the buffered version to clock registers that 
sample processor outputs. These sample registers should be 
immediately followed by staging registers clocked by an internally 
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buffered version of TClock. The buffered version of TClock should be 
used as the global system dock for the logic inside the gate array and 
as the clock for all registers that drive processor inputs. 
The use of staging registers places a constraint on the sum of the dock- 
to-Q delay of the sample registers and the setup time of the 
synchronizing registers inside the gate arrays: 

Clock-QDelay + 

Setup of Synch Register 025 (RClock period) 

- (Maximum Clock Jitter for RClock) 

- (Maximum Delay Mismatch for Internal Clock Buffers 

on RClock and TClock) 
The transmission time for a signal from the processor to an external 
agent composed of gate arrays in a system without phase lock can be 
calculated from the following equation: 

Transmission Tune = (75% of TClock period) - (too for R4000) 
+ (Minimum External Clock Buffer Delay) 

- (External Sample Register Setup Time) 

- (Maximum Clock Jitter for R4000 Internal Clocks) 

- (Maximum Clock Jitter for RClock) 

The transmission time for a signal from an external agent composed of 
gate arrays to the processor in a system without phase lock can be 
calculated from the following equation: 

Transmission Time = (TClock period) - (tcs for R4000) 

- (Maximum External Clock Buffer Delay) 

- (Maximum External Output Register Clock to Q Delay) 

- (Maximum Clock Jitter for TClock) 

- (Maximum Clock Jitter for R4000 Internal Clocks) 

A block-level diagram of a system without phase-lock, employing the 
R4000 processor and an external agent implemented as a gate-array, is 
shown in Figure 10-4 
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Figure 10-4 System Without Phase Lock Employing the R4000 Processor 
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Interface to CMOS Logic System 



When interfacing to CMOS logic system, matched delay clock buffers 
are used to allow the processor to generate aligned clocks for the 
external logic. One of the matched delay clock buffers is inserted in the 
processor's SyncOut Syncln clock alignment path, skewing SyncOut, 
MasterOut, RClock, and TClock to lead MasterClock by the delay of 
the matched delay clock buffer while leaving PClock aligned with 
MasterClock The remaining matched delay clock buffers can be used 
to generate a buffered version of TClock aligned with MasterClock. 
The alignment error of the buffered version of TClock is the sum of the 
maximum delay mismatch of the matched delay clock buffers and the 
maximum clock jitter of TClock. The buffered version of TClock is 
used to clock registers that sample processor outputs, as the global 
system clock for the discrete logic that forms the external agent, and to 
clock registers that drive processor inputs. 

The transmission time for a signal from the processor to an external 
agent composed of discrete CMOS logic devices can be calculated 
from the following equation: 
Transmission Time = (TClock period) - (too for R4000) 

- (External Sample Register Setup Time) 

- (Maximum External Clock Buffer Delay Mismatch) 

- (Maximum Clock Jitter for R4000 Internal Clocks) 

- (Maximum Clock Jitter for TClock) 

The transmission time for a signal from an external agent composed of 
discrete CMOS logic devices can be calculated from the following 
equation: 

Transmission Time = (TClock period) - (trjs for R4000) 

- (Maximum External Output Register Clock to Q 

Delay) 

- (Maximum External Clock Buffer Delay Mismatch) 

- (Maximum Clock Jitter for R4000 Internal Clocks) 

- (Maximum Clock Jitter for TClock) 

Note that, using this clocking methodology,, the hold time of data 
driven from the processor to an external sampling register is a critical 
parameter. In order to guarantee hold time, the minimum output 
delay of the processor, t D M' must be greater than the sum of the 
minimum hold time for the external sampling register, the maximum 
clock jitter for R4000 internal clocks, the maximum clock jitter for 
TClock, and the maximum delay mismatch of the external clock 
buffers. 
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A block-level diagram of a system without phase lock employing the 
R4000 processor and an external agent composed of both a gate array 
and discrete CMOS logic devices is shown in Figure 10-5. 
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Figure 10-5 System Without Phase Lock Employing the R4000 Processor 
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Processor Status Outputs 



The R4000 processor provides eight status outputs, Status(7:0), that 
change with each rising edge of MasterClock to indicate the 
processor's internal state during each of the two most recent PCydes. 
Status(7:0) is treated as two fields: Status(3:0) indicates the processor's 
internal state during the most recent PCyde, and Status(7:4) indicates 
the processor's internal state during the PCyde preceding the most 
recent PCyde. The encoding of processor internal state for Status(7:4) 
or Status(3:0) is shown in Table 10-1 . The four-bit decode describes the 
instruction occupying the WB stage during a given PCyde 
Table 10-1 Encoding of Processor Internal State for Status(7:4) or Status(3:0), 



Status(7:4) or 


Processor 




Status(3:0) 


internal state 




o T 


Run cyde: 


Other Integer instruction 


1 


Run cycle: 


Integer Load 


2 


Run cyde: 


Integer Untaken Branch 


3 


Run cycle: 


Integer Taken Branch 


4 


Run cycle: 


Integer Store 


5 


Reserved 




6 


Reserved 




7 


Run cycle: 


Killed by integer slip 


8 


Stall cycle: 


Other stall type 


9 


Stall cycle: 


Primary Instruction Cache 


a 


Stall cycle: 


Primary Data Cache 


b 


Stall cycle: 


Secondary Cache 


c 


Run cyde: 


Floating-Point 


d 


Run cyde: 


Killed by branch 


e 


Run cycle: 


Killed by exception 


1 


Run cyde: 


Killed by floating point slip 
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This chapter contains a description of the cache memory hierarchy, 
the operation of the primary and secondary caches and the R4000's 
interface to the secondary cache, and cache-coherent operation in a 
multiprocessor system. 

Cache Organization 

This section describes the organization of the on-chip primary caches 
and the optional off-chip secondary cache. 

Primary Caches 

The R4000 maintains the following four primary cache states: 

• Invalid 

• Shared 

• QXean Exclusive 

• Dirty Exclusive 

The cache state of a line in the processor's primary cache indicates the 
validity, shared, dirty, and ownership attributes of the cache line. 

• A cache line that does not contain valid information must 
be marked invalid. 

• A cache line in any state other than invalid contains valid 
information. 

• A cache line that is present in more than one cache in the 
system is said to be shared and must be in one of the shared 
states. 
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• A cache line that is present in exactly one cache in the 
system is said to be exclusive and may be in one of the 
exclusive states. 

• A cache line that contains data that is consistent with 
memory is said to be clean and may be in one of the clean 
states. 

• A cache line that contains data that is not consistent with 
memory is said to be dirty and must be in one of the dirty 
or shared states. 

A cache line can have only one owner at a time. The owner of a cache 
line is responsible for providing the current contents in the cache line 
on any read request A cache line is owned by the processor if the state 
of the secondary cache line is dirty exclusive, dirty shared, or if the 
state of the primary cache line is dirty exclusive when no secondary 
cache is present Clean cache lines are always owned by memory. 
In addition, if the owner of a cache line is a processor, that processor is 
responsible for writing the cache line back to memory when it is 
replaced in the course of either satisfying a cache miss or during the 
execution of a Writeback or Writeback Invalidate cache instruction. 

Primary Instruction Cache 

The R4000 primary instruction cache is: 

• Direct-mapped. 

• Indexed with a virtual address. 

• Checked with a physical tag. 

• Organized with either a 4-word (16-byte) or 8-word (32- 
byte) cache line. 

The primary instruction cache states are determined by the following 
cache line attribute: 

Invalid The cache line does not contain valid 

information. 
Each line of instruction cache data has an associated 26-bit tag. The tag 
contains a 24-bit physical address, a single valid bit, and a parity bit 
Byte parity is used on the instruction data. The format of an 8-word (32 
byte) primary instruction cache line is shown in Figure 11-1. 
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25 24 23 



PTag 



~J 



71 



24 
64 63 



DataP 



DataP 



DataP 



DataP 



Data 



Data 



Data 



Data 



8 



64 



PTag 

V 

Data 

P 

DataP 



physical tag (bits 35..12 of the physical address) 

valid bit 

cache data 

even parity for the PTag and V fields 

even parity-1 parity bit per byte of data 



Figure 11-1 Format of R4000 8-Word Primary Instruction Cache Line 

The 4-woid primary instruction cache line is accessed using 2 PCLK 
cycles; the 8-word primary instruction cache line is accessed using 4 
PCLK cycles. 

Primary Data Cache 

The R4000 primary data cache is: 

• Write-back. 

• Direct-mapped. 

• Indexed with a virtual address. 

• Checked with a physical tag. 

• Organized with either a 4-word (16-byte) or 8-word (32-byte) 
cache line. 
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The primary cache states indicate the following cache line attributes: 



Invalid 



Shared 



The cache line does not contain valid 

information. 

The cache line contains valid information 

and may be present in another cache. The 

cache line may or may not be consistent 

with memory, and may or may not be 

owned. 

The cache line contains valid information 

and is not present in any other cache. The 

cache line is consistent with memory and is 

not owned. 

The cache line contains valid information 

and is not present in any other cache. The 

cache line is inconsistent with memory and 

is owned by a processor. 

Each line of primary cache data has an associated 29-bit tag. The tag 
contains a 24-bit physical address, 2-bit cache line state, and a write- 
back bit The write-back bit has its own parity bit, and the tag and 
cache line state share a parity bit 

Figure 11-2 shows the format of a 8-word (32 byte) primary data cache 
line. 



Clean Exclusive 



Dirty Exclusive 
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28 27 26 25 24 23 



W 



W 



cs 



PTag 



24 



71 64 


63 




DataP 


Data 


DataP 


Data 


DataP 


Data 


DataP 


Data 



8 



64 



W 
W 

P 

CS 



PTag 

DataP 

Data 



even parity tor the write-back bit 

write-back bit (set if data is modified and different from 

secondary cache and memory) 

even parity for the PTag and CS fields 

primary cache state 

Invalid— ail R4000 configurations 

1 Shared (either Clean or Dirty)— R4000MC configurations only 

2 Clean Exclusive— R4000SC and MC configurations 

3 Dirty Exclusive— all R4000 configurations 
physical tag (bits 35..1 2 of the physical address) 
even parity for the data 

cache data 



Figure 11-2 Format of R4000 8-Word Primary Data Cache line 

In all R4000 processois, the W (write-back) bit not the cache state, 
indicates when the primary cache contains modified data that must be 
written back to memory or the secondary cache. 
In the R4000PC, the states Invalid and Dirty Exclusive are used to 
describe the cache line. In the R4000SC, the states Invalid, Clean 
Exclusive, and Dirty Exclusive are used to describe the cache line. In the 
R4000MC, all four states are used to describe the cache line and to 
control whether load and store operations need to access the 
secondary cache for coherency purposes. The effects of load and store 
operations for the four primary cache states are described in Table 11- 
2. 
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Table 11-1 R4000MC Data Cache Coherency States 



Primary 
Cache States 



Invalid 



Secondary 
Cache States 



Action on 
Load 



All 



Shared 



Shared 



Clean Exclusive 



Dirty Shared 



Miss 



None 



None 



Action on 
Store 



Miss 



Dirty Exclusive 



None 



Clean Exclusive 



Dirty Exclusive 



Dirty Exclusive 



Dirty Exclusive 



None 



None 



Read secondary tag. If the coheren- 
cy algorithm is Update on Write 
then send update and set the sec 
ondary cache state to Dirty Shared 
If the coherency algorithm is Invali 
date on Write, then send invalidatt 
and set the primary and secondar) 
cache states to Dirty Exclusive. 



If Dirty Exclusive, set the primarj 
cache state to Dirty Exclusive. 



Set the primary and secondarj 
cache states to Dirty Exclusive. 



Set the primary data cache state tc 
Dirty Exclusive. 



None 



None 



When the primary cache is filled from the secondary cache, the 
secondary cache state is mapped into the primary cache state by 
mapping the Shared and Dirty Shared secondary states into the Shared 
primary state. The Dirty Exclusive primary state allows the primary 
cache to be written without a secondary access. 
The 4-word primary data cache line is accessed using two PCLK 
cycles; the 8-word primary data cache line is accessed using four 
PCLK cycles. 



Secondary Cache 



The R4000 is designed to operate with an external secondary cache. 
The secondary cache is accessible to the processor and to the system 
interface. The cache contains data, cache tags and cache line state bits. 
R4000 processors support an optional external secondary cache which 
can be configured at chip reset as either a one joint cache, or a separate 
I-cache and D-cache. This secondary cache is: 
• Writeback. 
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• Direct-mapped. 

• Indexed with a physical address. 

• Checked with a physical tag. 

• Organized with either a 4-word (16-byte), 8-word (32-byte), 
16-word (64-byte), or 32-word (128-byte) cache line. 

The secondary cache states indicate the following cache line attributes: 

Invalid The cache line does not contain valid 

information. 



Shared 



Dirty Shared 



Clean Exclusive 



The cache line contains valid information 
and may be present in another cache. The 
cache line may or may not be consistent 
with memory, and is not owned. 

The cache line contains valid information 
and may be present in another cache. The 
cache line is inconsistent with memory and 
is owned. 

The cache line contains- valid information 
and is not present in any other cache. The 
cache line is consistent with memory and is 
not owned. 

Dirty Exclusive The cache line contains valid information 
and is not present in any other cache. The 
cache line is inconsistent with memory and 
is owned. 

The primary cache state shared corresponds to the secondary cache 

states shard, and dirty shared. 

The secondary-cache line has an associated 19-bit tag that contains 

bits 35..17 of the physical address, a 3-bit primary cache index, and a 

3-bit cache line state. These 25 bits are protected by a 7-bit error 

correction code (ECC). 

Figure 11-3 shows the format of the R4000 secondary-cache line. 
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31 25 24 22 21 19 18 



ECC 



CS 



Pldx 



STag 



19 



ECC ECC for secondary tag 
CS secondary-cache state 

Invalid 

1 reserved 

2 reserved 

3 reserved 

4 Clean Exclusive 

5 Dirty Exclusive 

6 Shared 

7 Dirty Shared 

Pldx primary cache index (bits 14..12 of the virtual address) 
STag physical tag (bits 35..1 7 of the physical address) 



Figure 11-3 Format ofR4O00 Secondary Cache Line 

The secondary-cache state (CS bits) indicates whether 

• The cache line data and tag are valid. 

• The data is at least potentially present in the caches of 
other processors (Shared versus Exclusive). 

• The processor is responsible for updating main memory 
(Chan versus Dirty). 

The primary caches are a subset of the secondary cache. The processor 
maintains this subset property by checking and invalidating the 
primary caches, if necessary, when a secondary cache line is replaced. 
The Pldx field provides me processor with an index to the virtual (not 
physical) address of primary cache lines that may contain data from 
the secondary cache line. 

A second function of the Pldx field is to detect a cache alias. If the 
physical address tag matches during a data reference to the secondary 
cache (S-cache), but the Pldx field does not match the appropriate bits 
in the virtual address, the reference was, made from a different virtual 
address than the one that created the secondary-cache line. Since this 
could create a cache alias, the processor signals this condition by 
taking a Virtual Coherency exception (see Chapter 5, Exception 
Processing). 
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Primary and Secondary Cache Interaction 



The primary caches are proper subsets of the secondary cache. In the 
R4000PC, the Invalid and Dirty Exclusive states are used to describe 
the cache line. In the R4000SC, the Invalid, Clean Exclusive, and Dirty 
Exclusive states are used to describe the cache line. In the R4000MC, 
all four states are used to describe the cache line and to control 
whether load and store operations need to access the secondary cache 
for coherency purposes. The effects of the load and store operations 
for the four primary cache states are described in Table 11-2 . This table 
can be better understood by realizing that there may be many primary 
cache lines for each secondary cache line. 
Table 11-2 R4000MC Data Cache Coherency States 



Primary 
Cache 
States 


Secondary 
Cache States 


Action on 
Load 


Action on 
Store 


Invalid 


All 


Miss 


Miss 


Shared 


Shared 


None 


Read secondary tag. If the coheren 
cy algorithm is Update on Write 
then send update and set the sec- 
ondary cache state to Dirty Shared 
If the coherency algorithm is Invali- 
date on Write, then send invalidate 
and set the primary and secondary 
cache states to Dirty Exclusive. 


Dirty Shared 


None 


Dirty Exclusive 


None 


If Dirty Exclusive, set the primary 
cache state to Dirty Exclusive. 


Clean Ex- 
clusive 


Clean Exclusive 


None 


Set the primary and secondary 
cache states to Dirty Exclusive. 


Dirty Exclusive 


None 


Set the primary data cache state tc 
Dirty Exclusive. 


Dirty Ex- 
clusive 


Dirty Exclusive 


None 


None. 



Upon a cache miss in both the primary and the secondary cache, the 
missing secondary cache line is loaded from memory into the 
secondary cache first The the appropriate subset is loaded into the 
primary cache. When the primary cache is filled from the secondary 
cache, the secondary cache state is mapped into the primary cache 
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state by mapping the Shared and Dirty Shared secondary states into the 
Shared primary cache state. The Dirty Exclusive primary cache state 
allows the primary cache to be written without a secondary cache 



access. 
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Cache Line Ownership 

The R4000 requires that cache lines have a single owner at all times. 
The owner of a cache line is responsible for providing the current 
contents in the cache line to any read requestor. The ownership of a 
cache line is set and maintained as follows: 

• A processor assumes ownership of the cache line if the state 
of the secondary cache line is dirty shared, dirty exclusive, or if 
the state of the primary cache line is dirty exclusive when no 
secondary cache is present For responses to processor 
coherent read requests in which the data is returned with 
an indication that it must be loaded in the dirty shared or 
dirty exclusive state, the cache state is set when the last 
word of read response data is returned. Therefore, the 
processor assumes ownership of the cache line when the 
last word of read response data is returned. 

• The processor gives up ownership of a cache line when the 
state of the cache line changes to invalid, shared, or clean 
exclusive. For processor coherent write requests the state of 
the cache line changes to invalid if the cache line is 
replaced, or to clean exclusive or shared if the cache line is 
retained. In either case, the cache state transition occurs 
when the last word of write data is transmitted to the 
external agent Therefore, the processor gives up 
ownership of the cache line when the last word of write 
data is transmitted to the external agent 

• For external requests, other than read responses, any cache 
state change associated with the external request, including 
a change of ownership, occurs at the completion of the 
external request. 

• Clean cache lines are always owned by memory. 

Cache Operation 

This section describes the operation of the R4000 caches. 

Cache Coherency 

The R4000 processor manages its primary and secondary caches using 
a write-back methodology; that is, it stores write data into the caches. 
A modified cache line is not written back to memory until the cache 
line is replaced either in the course of satisfying a cache miss or during 
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the executionofaWriteback or Writeback Invalidate cache instruction. 
When the contents of a cache line is not consistent with memory, it is 
said to be dirty. Many systems, in particular multi-processor systems, 
or systems that employ input/output (IO) devices that are capable of 
direct memory access (DMA), may require the system to behave as if 
me caches are always consistent with memory and each other. 

Schemes for maintaining consistency between multiple writeback 

caches or between writeback caches and memory are referred to as 

cache coherency protocols. 

The R4000MC processor, in its secondary cache mode, provides a set 

of cache states and mechanisms for manipulating the contents and 

state of the cache that are sufficient to implement a variety of cache 

coherency protocols, both snoopy and directory-based. In particular, 

the processor supports both the write-invalidate and 

write-update protocols simultaneously. 

The coherency protocol for lines in the cache is controUedby bits in the 

translation look-aside buffer CTLB) on a per-page basis. Specifically, 
the TLB contains three bits per entry that control the coherency 
attributes of apage. The three bits are encoded to provide five possible 
coherency attributes per page 

• uncached, 

• sharable, 

• update, 

• exclusive, and 

• noncoherent. 

A processor in the no-secondary cache mode supports only the 
uncached and noncoherent coherency attributes. 
If a page has the uncached coherency attribute, the processor issues a 
word or partial word read or write directly to main memory for any 
load or store to a location within that page. Lines within an uncached 
page are assumed never to be cache-resident 
If the coherency attribute is sharable, the processor issues a coherent 
block read for a load miss to a location within the page, and a coherent 
block read that requests exclusivity for a store miss to a location within 
the page. In most systems, coherent reads require snoops or directory 
checks to occur; noncoherent reads do not A coherent read that 
requests exclusivity implies that the processor functions most 
efficiently if the requested cache line is returned to it in an exclusive 
state, but the processor still performs correctly if the cache line is 
returned in a shared state. Cache lines within the page are managed 
with a write invalidate protocol; that is, the processor issues an 
invalidate on a store hit to a shared cache line. 
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If the coherency attribute is update, the processor issues a coherent 

block read for a load or store miss to a location within the page. Cache 

lines within the page are managed with a write update protocol; that is, 

the processor issues an update on a store hit to a shared cache line. 

If the coherency attribute is exclusive, the processor issues a coherent 

block read that requests exclusivity for a load or store miss to a 

location within the page. Cache lines within the page are managed 

with a write invalidate protocol. Load Linked Store Conditional 

instruction sequences must ensure that the link location is not in a 

page managed with the exclusive coherency attribute. 

If the coherency attribute is noncoherent, the processor issues a 

noncoherent block read for a load or store miss to a location within the 

page. 

The behavior of the processor on load misses, store misses, and store 

hits to shared cache lines for each of the coherency attributes is 

summarized in Table 11-3. 

Ttfok 11-3 Coherency Attributes and Processor Behavior 



Attribute 



Uncached 



Noncoherent 



Exclusive 



Sharable 



Update 



Load Miss 



Main memory read 



Noncoherent read 



Coherent read exclusive 



Coherent read 



Coherent read 



Store Miss 



Main memory write 



Noncoherent read 



Coherent read exclusive 



Coherent read exclusive 



Coherent read 



Store Hit Shared 



NA 



Invalidate * 



Invalidate ' 



Invalidate 



Update 



NOTE: "This should not occur under normal circumstances. 

The following sections describe: 

• The cache state transitions performed by the processor 
during execution. 

• The mechanisms provided for an external agent to 
manipulate the state and contents of the primary and 
secondary cache. 
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Cache State Changes 

The initial state of a cache line is specified by the external agent when 
it supplies the cache line. During the course of processor execution, 
the processor may change the state of a cache line. The following 
events cause changes to the state of the cache: 

• A store to a clean exclusive cache line causes the state to be 
changed to dirty exclusive in both the primary and 
secondary caches. 

• A store to a shared cache line, that is a line marked shared in 
the primary cache and either shared or dirty shared in the 
secondary cache, will cause the processor to issue either an 
invalidate request or an update request depending on the 
coherency attribute in the TLB entry for the page 
containing the cache line. Upon successful completion of 
an invalidate, the processor completes the store and 
changes the state of the cache line to dirty exclusive in both 
the primary and secondary caches. Upon successful 
completion of an update, the processor completes the store 
and changes the state of the cache line to shared in the 
primary cache and dirty shared in the secondary cache if 
dirty shared mode is enabled. Dirty shared mode is 
programmable via the boot-time mode control interface 
described in Chapter 12. If dirty shared mode is not enabled, 
the state of the primary and secondary caches will be left 
unchanged after successful completion of an update. 

Figure 11-4 and Figure 11-5 are state diagrams of the Primary and 
Secondary Caches respectively. 
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invalidate 
received 



write hit [update], 
read hit, 
update receiv« 



/ 



read miss [exclusive] 




write hit 
read hit 



Write Miss [shared and update] 
Read Miss [shared] 



[intervention] 

write miss [unshared or i 



Figure 11-i Primary Data Cache State Diagram 
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Figure 11-5 Secondary Cache State Diagram 



Cache Line Write-Back 



If the cache line is in the dirty exclusive or dirty shared state in the 
secondary cache, the processor writes a cache line back tojnemory 
when it is replaced, eitherin the course of satisfying a cache miss or 
during the execution of a Writeback or Writeback Invalidate cache 
instruction. When the processor writes a cache line back to memory, it 
does not ordinarily retain a copy of the cache line, and the state of the 
cache line is changed to invalid. However, under certain conditions 
related to load linked and store conditional, or if a cache line is written 
back by the Hit Writeback cache instruction, the processor retains a 
copy of the cache line. If the cache line is retained, the processor 
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changes the cache line state to clean exclusive if the secondary cache 
state was dirty exclusive before the write, or shared if the secondary 
cache state was dirty shared before the write. 
Whether or not the processor is retaining the line is signaled by the 
processor dining a write. 

Manipulation of the Caches by an External Agent 

The R4000 provides the following mechanisms for an external agent to 
examine and manipulate the state and contents of the primary and 
secondary caches: 

• An external agent must specify the state in which data, supplied 
in response to a processor read request, loads into the 
processor's caches. Data may be loaded in any of the four valid 
secondary cache states. Data returned by the external agent must 
not be marked invalid. The secondary cache state will be mapped 
by the processor to a primary cache state as previously described. 

• An external agent may issue a snoop request to the processor 
causing the processor to return the secondary cache state of the 
specified cache line. At the same time and according to a 
function supplied by the external agent, it atomically changes 
the state of the specified cache line in both the primary and 
secondary caches. 

• An external agent may issue an invalidate request or an update 
request to the processor. An invalidate request causes the 
processor to change the state of the specified cache line to 
invalid in both the primary and secondary caches. An update 
request causes the processor to write the specified data element 
into the specified cache line, and either change the state of the 
cache line to snared in both the primary and secondary caches, or 
leave the state of the cache line unchanged, depending on the 
nature of the update request An external agent may issue 
updates — without changing the state of the cache line — to cache 
lines that are in either exclusive or shared states 

• An external agent may issue an intervention request causing the 
processor to return the secondary cache state of the specified 
cache line, and, under certain conditions related to the state of 
the cache line and the nature of the intervention request, the 
contents of the specified secondary cache line. At the same time 
and according to a state change function specified by the 
external agent, the processor atomically changes the state of the 
specified cache line in both the primary and secondary caches. 
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Ordering Considerations 

Many cache coherent multiprocessor systems must obey ordering 
constraints on stores to shared data; therefore, they exhibit the same 
behavior as a uniprocessor system in a multiprogramming 
environment A multiprocessor system that exhibits such behavior is 
said to be strongly ordered. 

A typical algorithm for testing strong ordering follows: 
Given - Locations X and Y have no particular relationship; Le., they 
are not in the same cache line. 

Processor A performs a store to location X at the same time processor 
B performs a store to location Y. Next, processor A does a load from 
location Y at the same time that processor B does a load from location 
X. In order for the system to be considered strongly ordered, either 
processor A must load the new value of Y, or processor B must load 
the new value of B, or both processors A and B must load the new 
values of Y and X, respectively, under all conditions. If both processors 
A and B load the old values of Y and X, respectively, under any 
conditions, the system does not meet the requirements for strong 
ordering. 

The algorithm to test for strong ordering is summarized below. 
Processor A Processor B 

Store to location X Store to location Y 
Load from location Y Load from location X 
In order for this strong ordering test algorithm to succeed, stores must 
have a global ordering in time; that is, every processor in the system 
must agree that either the store to location X preceded the store to 
location Y, or the store to location Y preceded the store to location X. 
If this global ordering is enforced, this test algorithm for strong 
ordering will succeed. 

The requirements to achieve strong ordering translate into a need for 
precise control of when the processor restarts in relationship to a 
change in cache state initiated by an external coherence request 
Specifically, before allowing the processor to restart after completion 
of a processor coherence request, system designers must ensure 
completion of any cache state changes resulting from an external 
coherence request that occurs before a processor coherence request. 
The R4000 processor obeys the following paradigms for restart after 
issuing a coherence request 
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For coherent read requests, the processor will restart after either of the 
following conditions, unless a processor invalidate or update request 
is unacknowledged: 

• The requested double word is transmitted to the processor 
if sub-block ordering is enabled, or 

• The last word in the block is transmitted to the processor if 
sequential ordering is enabled. 

Any external requests that must be completed before the read is 
complete must be issued to the processor before the read response is 
issued. 

For coherent write requests, the processor restarts after the write 
request is complete; that is, after the last double word of data 
associated with the write request has been transmitted to the external 
agent, unless a processor read request is pending or a processor 
invalidate or update request is unacknowledged. 

For invalidate and update requests, the processor restarts after the 
assertion of IvdAck* or IvdErr*, unless a processor read request is 
pending or it is processing an external request when IvdAck* or 
IvdErr* are asserted. 

If IvdAck* or IvdErr* are asserted during or after the first cycle that 
the external agent asserts ExtRqst*, the processor will accept the 
external request and complete any cache state changes associated with 
the external request before restarting. 

If IvdAck* or IvdErr* are not asserted during or after the first cycle 
that the external agent asserts ExtRqst*, the processor will restart 
before beginning the external request 
In summary, external requests that must be completed before a 
processor invalidate or update completes can be completed providing 
the processor receives an asserted ExtRqst* by the external agent 
before or during the same cycle either IvdAck* or IvdErr* are 
asserted. 

Coherence Conflicts 

This section explains how the R4000 handles coherence conflicts 
caused by competing coherence requests from the processor and an 
external source. The topics in this section are: 

• How Coherence Conflicts Arise 

• System Implications of Coherence Conflicts 

• Handling Coherence Conflicts 
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This material applies only to the R4000MC which can issue processor 
coherence requests and accept external coherence requests. 

How Coherency Conflicts Arise 

The R4000MC processor issues processor coherence requests and 
accepts external coherence requests. 
Processor coherence requests are: 

• Processor coherent read requests 

• Invalidate requests 

• Update requests 
External coherence requests are 

• External invalidate 

• Update 

• Snoop 

• Intervention requests 

Because of the overlapped nature of the system interface it is possible 
for processor coherence requests and external coherence requests to 
conflict. That is, it is possible for an external coherence request to 
reference an address that targets the same cache line as a pending 
processor read request or an unacknowledged processor invalidate or 
update request The processor does not contain comparators to detect 
such conflicts. The processor uses the secondary cache as the single 
point of reference to determine the coherency actions it takes, and only 
checks the state of the secondary cache at specific times. 
For pending processor coherent read requests, conflicting external 
requests cannot affect the behavior of the processor. The processor 
only issues a read request for a particular cache line if it does not have 
a copy of that cache line. Therefore, any external coherence request 
j.u-1. *~.~*t<- * rariio line fhnt is also the tareet of a pending processor 
coherent read request will not find the line present in the cache. 
External coherence requests do not change the state of the cache unless 
the cache line they target is present Since no change can be made to 
the state of the cache for the line that is the target of the pending 
processor read request, no external coherence requests can effect the 
read request Therefore, external coherence requests that conflict with 
a pending processor coherent read request may be issued to the 
processor and will effectively be discarded by the processor. 
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For processor invalidate and update requests the cancellation 
mechanism is provided to signal conflicts to the processor. If a 
conflicting external coherence request is submitted while a processor 
invalidate or update request has been issued but not yet 
acknowledged, an external agent may cancel the processor invalidate 
or update. This applies to compulsory updates and invalidates only. 
This is accomplished by setting the cancellation bit in the command 
for the coherence request The processor, upon receiving an external 
coherence request with the cancellation bit set, will consider its 
invalidate or update request to be acknowledged and canceled, and 
will re-access the secondary cache and re-evaluate the cache state to 
determine the correct action. This may result in the invalidate or 
update request being re-issued, or it may result in the issue of a read 
request instead. 

An external agent is only allowed to assert the cancellation bit with an 
external coherence request when a processor invalidate or compulsory 
update request is currently unacknowledged. If an external coherence 
request is issued with the cancellation bit set when there is no 
unacknowledged processor invalidate or update request, the behavior 
of the processor is undefined. 

Processor potential update requests may not be cancelled. Potential 
updates are always issued under processor read requests and become 
compulsory only after the response to the processor read request is 
returned to the processor in one of the shared states. If. an external 
coherence request is issued with the cancellation bit set when a 
processor invalidate or update request is still potential, in other 
words, while a processor read request is currently pending, the 
behavior of the processor is undefined. 

An external agent may issue an external coherence request that is in 
conflict with an unacknowledged processor invalidate or update 
request without setting the cancellation bit In this case, the processor 
will be unaware of the conflict and will not re-evaluate the cache state 
to determine the correct action. It will simply wait for an acknowledge 
to its invalidate or update request just as it would for any invalidate 
or update request A system employing the R4000 may not behave 
correctly under these circumstances. 
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Note that it is not possible for external coherence requests to conflict 
with processor write requests since external requests are not accepted 
while a processor write request is in progress. The interactions 
between processor coherence requests and conflicting external 
coherence requests, tabulated by processor state, is summarized in 
Table 11-4 and Table 11-5. 
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The processor can be in one of the following states: 

• Idle 

No processor transactions currently pending. 

• Read Pending 

A processor coherent read request has been issued, but the 
read response has not yet been received. 

• Potential Update Unacknowledged 

A processor update request has been issued while a processor 
coherent read request is pending but has not yet been ac- 
knowledged, and, by definition, the response to the coherent 
read request has not yet been received. 

• Invalidate or Update Unacknowledged 

A processor invalidate or update request has been issued but 
has not yet been acknowledged and, by definition, there is not 
a processor coherent read request pending. 

Table 11-4 Coherence Conflicts Summary 



Processor 
State 


Conflicting External Coherence Request 


Invalidate 


Invalidate w/Cancei 


Update 


Update w/Cancel 


Idle 

Read Pending 

Potential 

Update 

Unacknowledged 

Invalidate 
or Update 
Unacknowledged 


NA 
OK 
OK 

OK* 


Undefined 
Undefined 
Undefined 

OK 


NA 
OK 
OK 

OK* 


Undefined 
Undefined 
Undefined 

OK 



* This may cause incorrect system operation and should not normally 
be allowed to occur. 
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Table 11-5 Coherence Conflicts Summary 



Processor 
State 



Idle 

Read Pending 

Potential 

Update 

Unacknowledged 

Invalidate 
or Update 
Unacknowledged 



Conflicting External Coherence Request 



Intervention 



NA 
OK 
OK 

OK* 



Intervention w/Cancel 



Undefined 
Undefined 
Undefined 

OK 



Snoop 



NA 
OK 
OK 

OK 1 



Snoop w/Cancel 



Undefined 
Undefined 
Undefined 

OK 



* This may cause incorrect system operation and should not normally 
be allowed to occur. 

System Implications of Coherence Conflicts 

The constraints that the processor places on the handling of conflicting 
coherency transactions have certain implications for the design of a 
multiprocessor system employing the R4000. This section will 
consider, as an example, a particular snoopy, split-read, bus-based 
system and the requirements for that system to correctly handle 
coherence conflicts. 

System Model 

The system model consists of the following components: 

1 Four processor subsystems, each consisting of an R4000 pro- 
cessor, a secondary cache, and an external agent The agent 
communicates with the R4000, accepting processor requests 
and issuing external requests, and with the system bus luce- 
wise issuing and receiving bus requests. 
2. A memory subsystem that communicates with main memory 
and the system bus. 
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3. A system bus with the following characteristics: 

- It is a multiple master, request based, arbitrated bus in 
which an agent that wishes to perform a transaction on 
the bus must request the bus and wait for global 
arbitration logic to supply a grant signal before 
assuming mastership of the bus. Once mastership has 
been granted, the agent may begin a transaction. 

- It supports a read transaction, read exclusive 
transaction, write transaction, and invalidate 
transaction. 

- It is a split-read bus in that independent transactions 
may occur on the bus between a read request from a 
particular agent and the return of data by the target of 
the read request. The return of data by the target of the 
read request will be referred to as the read response. 

- It is a snoopy bus in that all agents connected to the 
bus must monitor all of the traffic on the bus to 
correctly maintain cache coherency. 

I/O is not considered in this system model. 

Coherency Model 

The goal, for purposes of this example, is to implement a simple write 
invalidate cache coherency protocol for this system model that 
maintains consistency between all of the caches in the system and 
main memory. Attention is focused on the interactions between the 
system bus and the R4000, and the handling of coherence conflicts that 
arise in this system. 
The coherency model for the system is as follows: 

• All pages in the system are maintained either with the 
noncoherent coherency attribute or with the sharable coherency 
attribute. 

• The handling of noncoherent data will not be considered. 

• Using the sharable coherency attribute allows data to be shared 
between the four caches in the system with a write invalidate 
cache coherency protocol. 

• The secondary cache states used are invalid, shared, clean 
exclusive, and dirty exclusive. 

• The secondary cache state dirty shared is not used in this 
coherency model. 



R4000 User's Manual-Preliminary 1 1 -25 



Chapter 11 



When a processor misses in both caches on a load it issues a read 
request The external agent translates this to a read request on the bus. 
The returned data may be loaded in either the state clean exclusive or 
shared based on a shared indication returned on the bus with the read 
response. The shared indication is based on the result of an 
intervention request to the processor for the cache line of interest, and 
is supplied by the external agents that are a part of the other three 
processor subsystems. When a processor misses in both caches on a 
store, it issues a read request desiring exclusivity; this is translated to 
a read exclusive on the bus and the data is loaded in the state dirty 
exclusive. When a processor hits in the cache on a store to shared data, 
it issues an invalidate request which must be forwarded to the system 
bus before the store can be completed and the state changed to dirty 
exclusive. 

When an external agent observes a coherent read request on the 
system bus it does not take any immediate action. Instead, the external 
agent issues an intervention request to the processor for the read 
request during the read response associated with the read request 
This is referred to as a response complete read model; that is, the read is 
treated as complete only after the read response has occurred. This 
model requires that cache interrogation for a read must not occur until 
the read response occurs, as described, in order to maintain 
consistency. 

At the end of the read response, each external agent supplies an 
indication on the bus of whether it was able to obtain the state of the 
cache line that is the target of the read via an intervention request and 
if so, the external agent supplies an indication of sharing or takeover. 
Takeover occurs when an external agent discovers that its processor 
has a copy of the cache line that is the target of the read in the state 
dirty exclusive. If any external agent is unable to obtain the state of the 
cache line that is the target of the read because it is unable to initiate 
an intervention request, the read response is extended until all 
external agents have obtained the state of the cache line from their 
processors. 

The response from an external agent at the end of a read response 
depends on whether the read request was an ordinary read request or 
a read exclusive request 

For an ordinary read request, an external agent indicates shared at the 
end the read response if it finds that its processor has a copy of the 
requested cache line in the state clean exclusive or shared. If the current 
state of the cache line is clean exclusive, the external agent causes the 
processor to change the state of the cache line to shared. An external 
agent will indicate both shared and takeover at the end of a read 
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response to an ordinary read request if it finds that its processor has a 
copy of the requested cache line in the state dirty exclusive. Having 
indicated takeover, the external agent supplies the contents of the 
cache line (returned by the processor in response to its intervention 
request) over the bus to the read requester, and causes the processor 
to change the state of the cache line to shared. At the same time the 
cache line is supplied to the read requester, it is also written back to 
memory. 

For a read exclusive request, an external agent never indicates shared 
at the end of the read response, regardless of the state its processor has 
the cache line in. If the current state of the cache line is clean exclusive 
or shared, the external agent causes the state of the cache line to be 
changed to invalid. If the current state of the cache line is dirty 
exclusive, the external agent indicates takeover but not shared. Having 
indicated takeover, the external agent supplies the contents of the cache 
line over the bus to the read requester, and causes the processor to 
change the state of the cache line to invalid. At the same time the cache 
line is supplied to the read requester, it is also written back to memory. 
An invalidate request is considered complete as soon as it appears on 
the system bus. When an external agent observes an invalidate request 
on the system bus, it must react as if the invalidate has changed the 
state of all caches at that instant 

An external agent takes no action in response to the appearance of a 
write request on the bus. 

Handling Coherence Conflicts 

Coherence conflicts can be examined based on the current state of the 
processor. In particular, the processor may have a coherent read 
request pending, or it may have an invalidate request 
unacknowledged, or it may not have any requests pending or 
unacknowledged. Note that the read exclusive transaction on the 
system bus guarantees that the requested cache line is returned in an 
exclusive state. Therefore, the issue of potential updates is disabled 
through the boot-time mode control. 

Coherent Read Conflicts 

External coherence requests that conflict with pending processor 
coherent read requests may be issued to the processor without 
effecting the processor's behavior. Therefore, in this simple system 
model no conflict detection is performed by the external agent for 
processor coherent read requests. If an external intervention request 
or invalidate request is forwarded to the processor that is in conflict 
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with a pending processor coherent read request, it will not effect the 
processor's cache since the target cache line is guaranteed to be absent 
from the cache. The processor effectively discards the conflicting 
external intervention request, responding with an invalid indication 
for the target cache line. Similarly, the processor will discard a 
conflicting external invalidate request since the target cache line is not 
present and therefore already invalid. 

In a system model similar to the one described, conflict detection 
could be provided for pending processor coherent read requests, hi 
this case, when the external agent sees a read response on th€ ibus that 
conflicts with a pending processor coherent read request, it does not 
issue an intervention request to the processor. Rather, it simply reacts 
as if an intervention request has been completed and the cache line is 
not present in the processor's cache. Similarly, when the external 
agent sees an invalidate request on the bus that conflicts with a 
pending processor coherent read request, it does not forward the 
invalidate request to the processor since the target cache line is known 
to be absent from the processor's cache. Using this scheme for conflict 
detection on processor coherent read requests might slightly reduce 
the number of external intervention and invalidate requests issued to 
the processor. However, since the intervention and invalidate 
requests that would otherwise be issued to the processor would not 
res\dtmanystatemcdm<2tionwitrdntheprocessor,conflictdetection 

for processor coherent read requests is not necessary. 

Invalidate Conflicts 

From the time the processor has issued an invalidate request until the 
request has been acknowledged, any external coherence request 
issued to the processor that is in conflict with the unacknowledged 
invalidate must include a cancellation. In this system, anacknowledge 
for the invalidate will be generated to the processor as soon as the 
invalidate is forwarded to the system bus. Therefore, while the 

i ». i_ . nz— a */■« o^uiw mac+Archin of the svstem bus to 

external ageiu u» wmung u, »>*i»"- ~ — r — ■> 

forward an invalidate request, the external agent must detect, via 
comparators, any external coherence request that conflicts with the 
unacknowledged invalidate. If a conflict is detected, the external agent 
must not forward the invalidate request to the system bus. Instead, it 
must throw the invalidate request away and submit the conflicting 
external request to the processor with a cancellation. 
If the response to a coherent read request conflicts with a waiting 
unacknowledged processor invalidate request appears oh the bus, the 
external agent will detect the conflict and will not forward the 
processor invalidate request to the bus. Instead, it throws the 
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processor invalidate request away and issues an intervention request 
to the processor that includes a cancellation. The processor then re 
evaluates its cache state and reissues the invalidate request or issues a 
coherent read request instead. 

If an invalidate request appears on the bus while the external agent 
has a processor invalidate request waiting and the external agent 
detects a conflict, the external agent will not forward the processor 
invalidate request to the bus. Instead, it throws the processor 
invalidate request away and issues an external invalidate request to 
the processor that includes a cancellation. The processor then re- 
evaluates its cache state and reissues the invalidate request or issues a 
coherent read request instead. 

It is not possible for a write request to appear on the system bus that 
conflicts with a waiting processor invalidate request since, for an 
invalidate request to be issued, the state of the cache line must be 
shared in every cache in the system in which the line is present 



Coherent Write Conflicts 



As soon as a write request has been issued to the external agent the 
external agent becomes responsible for the cache line. No conflicts are 
possible with a processor write request; however, the external agent 
must manage ownership of the cache line while it is waiting to acquire 
mastership of the system bus so that it may forward the write request 
The external agent is responsible for the cache line from the time the 
issue cycle of the write request is accomplished until the write request 
is forwarded to the system bus. 

If the response to a coherent read request conflicts with a waiting 
processor write request or with a processor write request that has 
issued and is transmitting data appears on the bus, the external agent 
will detect the conflict and will not issue an intervention request to the 
processor. Instead, it reacts as if an intervention request has been 
completed and the line is in the dirty exclusive state. The external agent 
indicates takeover and supplies the read data to the read requester itself 
without disturbing the processor. After providing the read data to the 
read requester, the external agent must throw the write request away 
if the read request was a read exclusive. In fact, the external agent may 
throw the write request away for either type of read since processor 
supplied read data is also written back to memory. 

It is not possible for an invalidate request or a write request that 
conflicts with a waiting processor write request to appear on the 
system bus, since, for a processor write request to be issued, the state 
of the cache line must be dirty exclusive in that processor's cache. 
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Secondary Cache Interface 

The R4000SC and R4000MC vereions of the R4000 interface to an 
optional secondary cache. The secondary cache interface consists of a 
128-bit data bus, a 25-bit tag bus, an 18-bit address bus and SRAM 
control signals. The 128-bit wide data bus minimizes cache miss 
penalty, and allows the use of standard low-cost SRAMs in the 
secondary cache design 
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Secondary Cache Interface Signals 

Following is a secondary cache interface signal summary. 

SCData(127:0): (i/o) A 128-bit bus used to read or write 



SCDChk(15:0): 



SCTag(24:0): 
SCTChk(6:0): 

SCAddr(17:l): 

SCAddrOZ: 

SCAddrOY: 

SCAddrOX: 

SCAddrOW: 

SCAPat(2:0): 



SCOE*: 



SCWrZ* 



SCWrY*: 



SCWrX*: 



cache data from/to the secondary cache. 

(i/o) A 16-bit bus that carries two 8-bit ECC 
fields covering the 128 bits of the SCData 
from/to secondary cache. SCDChk(15:8) 
corresponds to SCData(127:64) and 
SCDChk(7:0) corresponds to SCData(63:0). 

(i/o) A 25-bit bus used to read or write 
cache tags from/to the secondary cache. 

(i/o) A 7-bit bus that carries an ECC field 

covering the SCTag from/to the secondary 

cache. 

(o) A 17-bit address bus for the secondary 

cache. 

(o) Bit of the secondary cache address. 

(o) Bit of the secondary cache address. 

(o) Bit of the secondary cache address. 

(o) Bit of the secondary cache address. 

(o) A 3-bit bus that carries the parity of the 
SCAddr bus and the cache control lines 
SCWR», SCDCS* and SCTCS*. The 
individual bit definitions are: 
SCAPar2 - Even Parity for SCAddr(17:12) 

and SCWR* 
SCAParl - Even Parity for SCAddr(U:6) 

and SCDCS* 
SCAParO - Even Parity for SCAddr(5:0) 

and SCTCS* 
(o) Output enable for the secondary cache 
RAM. 

(o) Write enable for the secondary cache 
RAM. 

(o) Write enable for the secondary cache 
RAM. 

(o) Write enable for the secondary cache 
RAM. 
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SCWrW*: (o) Write enable for the secondary cache 

RAM. 

SCDCS*: (o) Chip select enable signal for the 

secondary cache RAM associated with 
SCData and SCDChk. 

SCTCS*: (o) Chip select enable signal for the 

secondary cache RAM associated with 

SCTag and SCTChk. 
The interface to the secondary cache is designed to maximize the 
efficiency of servicing primary cache misses. The width of the data 
portion of the secondary cache interface is 128 bits to support a data 
rate into the primary cache that is near the processor to primary cache 
bandwidth during normal operation. To assure that mis bandwidth is 
maintained, each data, tag and check pin must be connected to only 
one static RAM device. The SCAddr bus and the SCOE*, SCDCS*, 
and SCTCS* signals drive a large number of static RAM devices; 
therefore, one level of external buffering between the R4000 and the 
cache array is necessary. 

The speed of the secondary cache interface is limited by buffered 
control signals. Critical control signals are duplicated to minimize this 
effect The SCWR* signal and SCAddr(0) are duplicated four times so 
that external buffering will not be required. When an 8-word (256-bit) 
primary cache line is used, these signals can be controlled more 
quickly; this reduces the time of the two back-to-back transfers. These 
duplicated control signals are specified to drive 11 parts each; 
therefore, a total of 44 RAM packages can be used in the cache array. 
This permits a cache design using 16 KByte by 64 bit, 64 KByte by 4 bit, 

or 256 KByte by 4 bit standard static RAMs. Other cache designs are 
also acceptable. For example, a smaller cache design using twenty-two 
8 KByte by 8 bit static RAMs; this design presents less load on the 
address pins and control signals, and reduces the overall parts count 

.... ,. . j _• i- i:i._ or-iArDTAr* crwUY* CfWRY* and 

Note mat aupucateabigniiisuRcov-TiJ.vf» , ^v.».»w» , *, 

SCWRZ* will be described in this document as though they were a 
single signal. This signal is called SCWR*. 

The benefit of duplicating SCAddr(O) is greater in systems using fast 
sequential static cache RAMs and a primary cache line size of 8 words. 
If SCAddr(O) is attached to the static RAM address bit that affects 
column decode only, the read cycle time with respect to that pin 
should approximate the output enable time of the RAM. For fast static 
RAM it should be half that of the nominal read cycle time. 
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When the split instruction/data cache mode is enabled, assertion of 

the top SCAddr bit, SCAddr(17), enables the instruction half of the 

cache instead of the data half. 

It is possible to design a cache that supports both joint and split 

instruction/data configurations with less than the maximum cache 

size. SCAddr(12:0) must be used to address the cache in all 

configurations. SCAddr(17) must be used to support the split 

instruction/data configuration Any of SCAddr(16:14) may be 

omitted because of the fixed width of the physical tag array. 

The SCDChk bus is divided into two fields to cover the upper and 

lower 64 bits of SCData. This form is required to keep the width of 

internal data paths to 64 bits. 

The SCTag bus is divided into three fields, as shown in Figure 11-6. 



24 



22 21 



19 18 



CacheState 



PIDx 



Physical_Tag 



19 



Figure 11-6 SCTag Fields 

The SCDCS* and SCTCS* are needed to disable reads or writes of the 
data array or tag array when the other array is being accessed. These 
signals are useful for saving power on snoop and invalidate requests 
because accesses to the data array are not necessary. These signals are 
also useful for writing data from the data primary cache to the 
secondary cache because the secondary cache state is not always 
known by the primary cache. 
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Operation of the Secondary Cache Interface 



The control of the secondary cache is configurable for various clock 
rates and static RAM speeds. All configurable parameters are 
specified in multiples of PClock, which runs at twice the frequency of 
the external system dock, MasterQock. Boot time mode control 
registers hold the various configuration parameters so they can be 
specified by software when initializing the processor. 
Table 11-6 Secondary Cache Timing Parameter 



tRdlCyc: 


4-15PCydes 


tRd2Cyc: 


3-15 PCydes 


tDis: 


2-7PCydes 


tWrlDly: 


1-3 PCydes 


tWr2Dly. 


l-3PCydes 


tWrRC: 


0-1 PCydes 


tWrSUp: 


3-15 PCydes 



Read Cycles 



Each secondary cache read sequence begins with the driving of the 

address pins. The output enable signal SCOE* is asserted at the same 

time. 

There are two basic read cydes: a four-word read, and an eight-word 

read. 

For the four-word read, there are two parameters of interest The first 

parameter is read sequence cyde time, Tr^ctc which specifies the 

time from the driving of the SCAddr bus to the sampling of the 

SCData bus. The second parameter is the cache output disable time 

Tims' which specifies the time from the end of a read cyde to the start 

of the next write cyde. Figure 11-7 illustrates the four word read 

sequence. 
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PCycle 
SCAddr Bus 

SCData/SCTag 
SCDChk/SCTC 

SCOE* 

SCDCS*: 
SCTCS*: 


II 1 


| 2 | 3 


| 4 | 5 


1 e 


1 


i 


Address 


I 








tRdlCyc 
















I Data I 




h 




1 


tDis 
















X 
1 




I 








I 

















Figure 11-7 Four-Word Read Cycle 

For the eight-word read, there is one additional parameter of interest 
the time from the first sample point to the second sample point, 
TfcECyc- The lower-order address bit, SCAddrCO) is changed at the 
same time as the first read sample point Figure 11-8 illustrates the 
eight word read sequence. 
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PCycle || 1 i 2 

SCAddr(17:1) TZZ. 



Address 



6 | 7 | 8 

~ i~ 



tRdlCyc 



SCAddr(O) 



SCData/SCTat 
SCDChk/SCT 

SCOE* 



SCDCS* 
SCTCS* 



First Address 



1 ( Second Address A 



*Rd2Cyc 



ChBC 



ZZA229L 



]SC 



D- 



1 



_J 1 



I 



I 



Figure 11-8 Eight-Word Read Cycle 

All read cycles can be aborted by changing the address. A new cycle 
starts with the edge on which the address is changed. Additionally, 
the period tr^ after a read cycle can be interrupted any time by the 
start of a new read cycle. If a read cycle is aborted by a write cycle, 
SCOE* must be deasserted for the t^ period, before the write cycle 
can commence. Read cycles can also be extended indefinitely. There is 
no requirement to change the address at the end of a read cycle. 



Write Cycles 



Like the read sequence, the secondary cache write sequence begins 
with the driving of the address pins. 

There are two basic write cycles: a four-word write cycle and an eight- 
word write cycle. 
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For the four-word write, there are several parameters of interest 

TwriDiy dela Y from the drives of ** address to the 
assertion of SCWR*. 

Tvftsup dela y ixom driving the second data double- 
word to the deassertion of SCWR*. 

T WrRc dela y fr 0111 itie deassertion of SCWR* to the 
beginning of the next cycle. 
T WrRc will be zero for most cache designs. Note that the upper data 
double word and the lower data double word will normally be driven 
one cycle apart This reduces the peak current consumption in the 
output drivers. Figure 11-9 illustrates the four word write sequence. 
The order of driving the upper versus the lower half on SCData is not 
fixed; either the upper or lower half may be driven first 



PCycle 
SCAddr Bus 



SCData(63:0)/ 
SCDChW7:0)or r 
SCData(127:64)ri. 
SCDChk(15:8) 

SCTChk{6:0)/ 
SCTag(24:0) 



-C 



SCData(1 27:64)/ 

k{15:8) 

(63:0)/ 
SCDChk(7:0 



SCDChk(15:8)br 
SCData(63:0)/ 



SCWR* 



SCOE* 

SCDCS* 

SCTCS* 



Address 



Data 



Data 



Data 



tWrSUp 



H 



1 



tWrlDly 



JjWrRc 

ZEZ 



zz 



Figure 11-9 Four-Word Write Cycle 
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The eight word write has one additional parameter: Tw r 2Diy This is 
the time from changing the low-order address bit SCAddr(O) to the 
assertion of SCWR» for the second time. The lower half of SC 
Data will be driven out on the same edge as the change in SCAddr(O). 
Figure 11-10 illustrates the eight word write sequence. 



PCycle || 1 I 

SCAddr(17:1) " X 

SCAddr(0) 1 

SCData(63 
SCDChk(7:i 



Address 



First Address 



SCData(63:0)/ r 

7:0) V. 



Seeond_Address 



—A 



First_Data 



Second Data 



SCTag(24:0)/ _r 
SCTChk(6:0) *- 



First Data 



ZDC 



SecondData 



SCDChk(15:8) - 

SCData(1 27:64)- 
SCWR* 



First Data MS/DTagChk 

4 * x 



Second_Data_MS/DTag_Chk 



First Data 



1 ( Second_Data 



hwnpiy 



tWrSUp 



J \_ 



tWrfte 



TWrSUp" 



tWrRc 



SCOE* 

SCDCS* 

SCTCS* 



3Z 



Figure 11-10 Eight-Word Write Cyefe 

When receiving data from the system interface, it is possible that the 
first data double word will arrive several cycles before the second data 
double word. In this case, the cache state machine simply waits in a 
state that extends the SCWR* until T W iSUp ^^ * e driving of the 
second data item. 
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Functional Overview 

The operation of the R4000 requires a multilevel reset sequence using 
the VCCOk, ColdReset*, and Reset* inputs. A power-on or cold reset 
accomplish the same thing: they both completely reset the internal 
state machine of the R4000. A warm reset also resets the internal state 
machine; however, the processor internal state is preserved. 
Fundamental operational modes for the processor are initialized by 
the initialization interface. The initialization interface is a serial 
interface operating at a frequency of MasterClock divided by 256. The 
low-frequency operation allows the initialization information to be 
stored in a low-cost EPROM. 

Immediately after the VCCOk signal is asserted, the processor reads 
a serial bit stream of 256 bits on Modeln to initialize all fundamental 
operational modes. After initialization is complete, the processor 
continues to drive the serial clock output, but no further initialization 
bits are read. 

Initialization Interface Operation 

1. While VCCOk is deasserted, the ModeClock output is 
held asserted. 

2. The processor synchronizes the ModeClock output at the 
time VCCOk is asserted, and the first rising edge of the 
ModeClock will occur 256 MasterClock clock cycles after 
VCCOk is asserted. 
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3. After each rising edge of the ModeClock, the next bit of 
the initialization bit stream must be presented at the 
Modeln input. The processor will sample exactly 256 
initialization bits from the Modeln input. 

Modeln: (i) Serial boot mode data in. 

ModeClock: (o) Serial boot mode data clock out at the 
MasterClock frequency divided by 256. 

Refer to Figure 12-1 and Figure 12-2 for timing relationships. 

Boot-Time Modes 

The correspondence between bits of the initialization bit stream and 
processor mode settings is illustrated in Table 12-1. Bit of the bit 
stream is the bit presented to the processor when VCCOk is asserted. 
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Table 12-1 Boot Time Modes 



Serial Bit 



5:6 



9:10 



Value 



Mode Setting 



BlkOrder: Secondary Cache Mode block read response ordering. 



Sequential ordering. 



Sub-block ordering. 



EIBParMode: Specifies nature of system interface check bus. 







SECDED error checking and correcting mode. 



Byte parity. 



EndBIt: Specifies byte ordering. 



Little Endian ordering. 



Big Endian ordering. 



DShMdDis: Dirty shared mode, enables transition to dirty shared state on 
processor update successful. 







Dirty shared mode enabled. 



Dirty shared mode disabled. 



NoSCMode: Specifies presence of secondary cache. 







Secondary cache present 



No secondary cache present 



SysPort: System Interface port width, bit 6 most significant 







1-3 



64 bits. 



Reserved. 



SC64BitMd: Secondary cache interface port width. 



128 bits. 



Reserved. 



EISpltMd: Specifies secondary cache organization 



Secondary cache unified. 



Reserved. 



SCBIkSz: Secondary cache line size, bit 10 most significant 







4 words. 



8 words. 



16 words. 



32 words. 
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Table 12-2 Boot Time Modes 



Serial Bit 



Value 



Mode Setting 



11:14 



15:17 



18 



19 



20 



21:24 



25:26 



XmitDatPaU System interface data rate, bit 14 most significant 



DDx 



DDxx 



9-15 



DxDx 



DDxxx 



DDxxxx 



DxxDxx 



DDxxxxxx 



DxxxDxxx 



Reserved. 



SysCkRatio: PClock to SClock divisor, frequency relationship between SClock, 
RClock, and TClock and PClock, bit 17 most significant 



3-7 



Divide by 2. 



Divide by 3. 



Divide by 4. 



Reserved. 



Reserved. 



TimlntDis: Timer Interrupt enable allows timer interrupts, otherwise the interrupt 
used by the timer becomes a general-purpose interrupt 



Timer Interrupt enabled. 



Timer Interrupt disabled. 



PotUpdDis: Potential update enable allows potential updates to be issued. 
Otherwise only compulsory updates are issued. . — 



Potential updates enabled. 



Potential updates disabled. 



TWrSUp: Secondary cache write deassertion delay, TWrSup in PCydes, bit 24 
most significant 



0-2 



3-15 



Undefined. 



Number of PCLK cycles; Min 3; Max 15. 



TWr2Dly: Secondary cache write assertion delay 2, TWr2Dly in PCydes, bit 26 
most significant 



1-3 



Undefined. 



Number of PCLK cydes; Min 1, Max 3 



12-4 



R4000 User's Manual-Preliminary 



Initialization Interface 



Table 12-3 Boot Time Modes 



Serial Bit 


Value Mode Setting 


2758 


TWrlDly: Secondary cache write assertion delay 1, TWrlDly in PCycles, bit 28 
most significant 





Undefined. 


1-3 


Number of PCLK cycles; Min 1, Max 3 


29 


TWrRc: Secondary cache write recovery time, TWrRc in PCycles, either or 1 
cycles. 





cycle 


1 


1 cycle 


30:32 


TDis: Secondary cache disable time, TDis in PCycles, bit 32 most significant 


0-1 


Undefined. 


2-7 


Number of PCLK cycles; Min 2, Max 7 


33:36 


TRd2Cyc: Secondary cache read cycle time 2, TRdCyc2 in PCycles, bit 36 most 
significant. 


0-2 


Undefined. 


3-15 


Number of PCLK cycles; Min 3; Max 15. 


37:40 


TRdlCyc: Secondary cache read cycle time 1, TRdCycl in PCycles, bit 40 most 
significant. 


0-3 


Undefined. 


4-15 


Number of PCLK cycles; Min 4; Max 15. 


41:45 





Reserved. 


46 


Pkgl79: R4000 Package type. 





Large (447 pin). 


1 


Small (179 pin). 


47:49 


CycDivisor: This mode determines the clock divisor for the reduced power mode. 
When the RP bit in the Status Register is set to one, the pipeline clock is 
divided by one of the following values. Bit 49 is most significant 





Divide by 2 


1 


Divide by 4 


2 


Divide by 8 


3 


Divide by 16 


4-7 


Reserved. 


50:52 


Drv0_50, Drv0_75, Drvl_00: Drive the outputs out in NxMasterClock period. 
Bit 52 most significant. Those combinations not defined below are reserved. 


1 


Drive at 0.50 x MasterClock period. 


2 


Drive at 0.75 x MasterClock period. 


4 


Drive at 1.00 x MasterClock period. 
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Table 12-4 Boot Time Modes 



Serial Bit 



53:56 



57:60 



61 



62 



63 



64 



65:255 



Value 



Mode Setting 



InitP: Initial values for the state bits that determine toe pull-down Ai/At ana 
switching speed of the output buffers. Bit 53 is the most significant 



1-14 



15 



Fastest pull-down rate 
Intermediate pull-down rates. 



Slowest pull-down rate. 



InitN: Initial values for the state bits that determine the pull-up Ai/At and 
switching speed of the output buffers. Bit 57 is the most significant 



1-14 



15 



Slowest pull-up rate 



Intermediate pull-up rates. 
Fastest pull-up rate. 



EnblDPLLR: Enables the negative feedback loop that determines the Ai/At and 
switching speed of the output buffers only during ColdReset 



Disable Ai/At mechanism. 



Enable Ai/At mechanism. 



EnblDPLL: Enables the negative feedback loop that determines the Ai/At and 
switching speed of the output buffers during ColdReset and during normal 
operation. 



Disable Ai/At control mechanism. 



Enable Ai/At control mechanism. 



DsblPLL: Enables PLLs that match Masterln and produce RClock, TClock SClock 
and the internal clocks. 







Enable PLLs. 



Disable PLLs. 



SRTristate: Controls when output-only pins are tristated 







Only when ColdReset 1 ' is asserted. 



When Reset* or ColdReset* are asserted 



Reserved, bean in zeros. 



Selecting a reserved value results in undefined processor 

behavior. 

Bits 65 to 255 are reserved bits. 

Zeros must be scanned in for all reserved bits. 
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Reset Operation 

The R4000 supports three types of resets: 

• Power-on Reset Starts from power supply turning on. 

• Cold Reset Restarts all docks, but power supply remains 
stable. Processor operating parameters do not change. 

• Warm Reset Restarts processor, but does not affect clocks. 
The operation of each type of reset is described in a subsection below. 



Reset Signal Summary 

VCCOk: 



ColdReset*: 



Reset*: 



When asserted, VCCOk indicates to the 
R4000 that the +5 volt power supply has 
been above 475 volts for more than 100 
milliseconds and will remain stable. The 
assertion of VCCOk initiates the reading of 
the boot-time mode control serial stream. 

ColdReset* must be asserted for a power 
on reset or a cold reset The clocks SClock, 
TClock, and RClock begin to cycle and are 
synchronized with the de-assertion edge of 
ColdReset*. ColdReset* must be deasserted 
synchronously with MasterClock. 

Reset* must be asserted for any reset 
sequence. It may be asserted 
synchronously or asynchronously for a 
cold reset, or synchronously to initiate a 
warm reset Reset* must be deasserted 
synchronously with MasterClock. 



Power-on Reset 



The sequence for a power-on reset is: 

1. Stable VCC of at least 4.75 volts from the +5 volt power supply 
is applied to the processor. A stable continuous system clock 
at the processor's desired operational frequency is also sup- 
plied. 

2. After at least 100 milliseconds of stable VCC and Master- 
Clock, the VCCOk input to the processor may be asserted. 
The assertion of VCCOk causes the processor to initialize the 
operating parameters. After the mode bits have been read in, 
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the processor allows its internal phase locked loops to lock, 
stabilizing the processor internal dock, PClock, the SyncOut- 
Syncln clock path and the master clock output, MasterOut 

3 Once the boot-time mode control serial data stream had been 
read by the processor, the ColdReset* input may be deassert- 
ed. ColdReset* must remain asserted for at least 64 Master- 
Clock cycles after the assertion of VCCOk. ColdReset* must 
be de-asserted synchronously with MasterQock. 

4 The de-assertion edge of ColdReset* is used to synchronize 
the edges of SQock, TClock, and RClock, potentially across 
multiple processors in a multiprocessor system. 

5. After ColdReset* is de-asserted and SQock, TClock and 
RClock have stabilized, Reset* is deasserted to allow the pro- 
cessor to begin to run. Reset* must be held asserted for at least 
64 MasterClock cycles after the de-assertion of 
ColdReset*. Reset* must be de-asserted synchronously with 
MasterClock. 

ColdReset* must be asserted when VCCOk asserts. The behavior of 
the processor is undefined if VCCOk asserts while ColdReset* is de- 
asserted. 



Cold Reset 



Warm Reset 



A cold reset can begin once the processor has read the initialization 
data stream, causing the processor to start with the Reset Exception. A 
cold reset requires the same sequence as power-on reset except that 
the power is presumed to have been stable before the assertion of the 
reset inputs and the deassertion of VCCOk. VCCOk must be 
deasserted for a minimum of 100 msec before reassertion, to begin the 
reset sequence. 



To affect a warm reset, the Reset* input may be asserted 
synchronously with MasterClock and held asserted for at least 64 
MasterClock cycles before being deasserted synchronously with 
MasterClock. The processor internal clocks, PClock and SQock, and 
the system interface clocks, TQock and RClock ,are not be affected by 
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a warm reset, and the boot-time mode control serial data stream is not 
read by the processor on a warm reset A Warm Reset causes processor 
to start with the Soft Reset Exception 

The master clock output, MasterOut, is provided for use in generating 
the reset related signals for the processor that must be synchronous 
with MasterClock. 

After a power on reset, cold reset, or warm reset, all processor internal 
state machines are reset, and the processor begins execution at the 
reset vector. All processor internal states are preserved during a warm 
reset, although the precise state of the caches will depend on whether 
a cache miss sequence has been interrupted by resetting the processor 
state machines. 
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The following timing diagrams illustrate a power-on reset, cold reset, 
and warm reset 
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Figure 12-1 Power-On Reset 
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Figure 12-2 Cold Reset 
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Figure 12-3 Warm Reset 
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The R4000 processor provides a boundary scan interface using the 
industry standard JTAG protocol. 

JTAG Interface Signal Summary 

JTDI: (i) JTAG serial data in. 

JTDO: (o) JTAG serial data out. 

JTMS: (i) JTAG command signal. 

JTCK: (i) JTAG serial clock input 

JTAG Functionality 

The JTAG boundary scan mechanism provides a capability for testing 
the interconnect between the R4000 processor, the printed circuit 
board to which it is attached, and the other components on the board. 
In addition the JTAG boundary scan mechanism provides a 
rudimentary capability for low-speed logical testing of the secondary 
cache RAMs. The JTAG boundary scan mechanism does not provide 
any capability for testing the R4000 processor itself. 
The JTAG boundary scan mechanism is compatible with JTAG 
specifications. The R4000 processor contains the JTAG registers— TAP 
controller, JTAG Instruction Register, JTAG Boundary Scan Register 
and a JTAG Bypass Register— and executes the standard JTAG 
EXTEST operation associated with external test functionality testing. 
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JTAG Test Access Port (TAP) 

The JTAG Test Access Port (TAP) consists of the 4 pins described 
above. Data is serially scanned into one of the three registers 
(Instruction Register, Bypass Register, Boundary San Register) from 
the JTDI pin, and is scanned out from the selected one of these 
registers onto the JTDO pin. The JTDI input feeds the LSB of the 
selected register, and the MSB of the selected register appears on the 
JTDO output The JTMS input controls the state transitions of the 
main TAP controller state machine. 

Data on the JTDI and JTMS pins is sampled on the rising edge of the 
JTCK input dock signal. Data on the JTDO pin changes on the falling 
edge of the JTCK clock signal. 

JTAG TAP Controller 

The R4000 implements the 16-state JTAG TAP controller as defined in 
the IEEE JTAG specification. 

The TAP controller state machine can be put in its Reset state in one of 
two ways. Deassertion of the VCCOk input will reset the TAP 
controller. Keeping the JTMS input signal asserted through five 
consecutive rising edges of the JTCK clock input will also send the 
TAP controller state machine into its Reset state. In either case, 
keeping JTMS asserted will maintain the Reset state. 

Instruction Register 

The R4000's JTAG Instruction Register is three bits wide and is 
encoded as follows: 



MSB ... LSB 





X X 1 

X 1 X 

1 X X 



Selected Data Register 



Boundary scan negiSier 
(external test only) 
Bypass Register 
Bypass Register 
Bypass Register 



The instruction register comprises two stages; the shift register stage 
and the parallel output latch. When the TAP controller is in the Reset 
state, the value 7 (111) is loaded into the parallel output latch, thus 
selecting the Bypass Register as the default When the TAP controller 
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is in the Capture-IR state, the value 4 (100) is loaded into the shift 
register stage. When the TAP controller is in the Shift-IR state, data is 
serially shifted into the shift register stage of the Instruction Register 
from the JTDI input pin, and the MSB of the Instruction Register's 
shift register state is shifted out onto the JTDO pin. When the TAP 
controller is in the Update-IR state, the current data in the shift register 
stage is loaded into the parallel output latch. 



Bypass Register 



The Bypass Register is one bit wide. When the TAP controller is in the 
Shift-DR (Bypass) state, the data on the JTDI pin is shifted into the 
Bypass Register, and the Bypass Register's output is shifted out onto 
the JTDO output pin. 



Boundary Scan Register 



The Boundary Scan Register is 319 bits wide. The three most- 
significant bits control the output enables on the various bidirectional 
buses. The most-significant bit is the JTAG output enable bit for the 
SysAD, SysADC, SysCmd, and SysCmdP buses. The next most 
significant bit is the JTAG output enable for the SCData and SCDChk 
buses. The third most-significant bit is the JTAG output enable for the 
SCrag and SCTChk buses. The remaining 316 bits correspond to 316 
signal pads of the R4000. 
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The scan order of these 316 scan bits is listed below starting from JTDI 
and ending with JTDO. 



1. 


SCDChk[13] 


39. 


SCTag[19] 


79. 


SCData[89] 


119. SCData[85] 


156. SysAD[48] 


2. 


SysADC[l] 


40. 


SCData[61 


80. 


SCData[56] 


120. SCData[52] 


157. SCData[16] 


3. 


SCDChk[l] 


41. 


SysAD[61] 


81. 


SysAD[56] 


121. SysAD[52] 


158. SysAD[16] 


4. 


SysADC[5] 


42. 


SCData[29] 


82. 


SCData[24] 


122. SCData[20] 


159. SCData[112] 


5. 


SCDChk[5] 


43. 


SysAD[29] 


83. 


SysAD[24] 


123. SysAD{20] 


160. SCAddr[4]IntB[2] 


6. 


Status[0] 


44. 


SCData[125] 


84. 


SCData[120] 


124. SCData[116] 


161. SCAddr[5] 


7. 


Status[ 1] 


45. 


ResetB 


85. 


GrpStallB 


125. ValidOutB 


162. SCData[80] 


8. 


Status [2] 


46. 


SCTag[20] 


86. 


SCTChk[0] 


126. SCTOik[4] 


163. SCAddr[6] 


9. 


Status[3] 


47. 


SCData[93] 


87. 


SCData[88] 


127. SCData[84] 


164. SCAddr[7] 


10. 


IvdErrB 


48. 


SCData[60] 


88. 


SCDChk[6] 


128. SCData[51] 


165. SCAddr[8] 


11. 


Status[4] 


49. 


SysAD[60] 


89. 


SysADC[6] 


129. SysADlSl] 


166. SCAddr[9] 


12. 


IvdAekB 


SO. 


SCData[28] 


90. 


SCDChk[21 


130. SCData[19] 


167. SCAddrtlO] 


13. 


Status[5] 


51. 


SysAD[28] 


91. 


SysADC[2] 


131. SysAD[19] 


168. SCAddrfll] 


14. 


Status[6] 


52. 


SCData[124] 


92. 


SCDChk[14] 


132. SCData[115] 


169. SC64Addr 


15. 


Status [7] 


53. 


ColdResetB 


93. 


NMIB 


133. ValidlnB 


170. SCAddr[12] 


16. 


SCDCbk[7) 


54. 


SCTag[21] 


94. 


SCTChk[l] 


134. SCTChk[5] 


171. SCAddr[13] 


17. 


SysADCf7] 


55. 


SCData[92] 


95. 


SCDChk[10] 


135. SCData[83] 


172. SCAddr[14] 


18. 


SCDChk[3] 


56. 


SCData[59] 


96. 


SCData[55] 


136. SCAddr0W,X (share 


173. SCAddr[15] 


19. 


SysADQ3] 


57. 


SysAD[59] 


97. 


SysAD[55] 


the same JTAG bit) 


174. SCAddr[16] 


20. 


SCDChk[15] 


58. 


SCData[27] 


98. 


SCData[23] 


137. SCAddrOY,Z (share 


17S. SCAddr[17] 


21. 


VCCOk 


59. 


SysAD[27] 


99. 


SysAD[23] 


the same JTAG bit) 


176. SCData[64] 


22. 


SCTag[16] 


60. 


SCData[123] 


100. 


SCData[119] 


138. SCAddr[l] 


177. SCAPar[0] 


23. 


SCDCbkfll] 


61. 


IOIn 


101. 


ReleaseB 


139. SCData[50] 


178. SCAPar[l]/IntB[3] 


24. 


SCData[63] 


62. 


SCTag[22] 


102. 


SCTChkt2] 


140. SysAD[50] 


179. SCData[96] 


25. 


SysAD[63] 


63. 


SCData[91] 


103. 


SCData[87] 


141. SCData[18] 


180. SysAD[0] 


26. 


SCData[31] 


64. 


SCData[58] 


104. 


SCData[54] 


142. SysAD[18] 


181. SCData[0] 


27. 


SysAD[31] 


65. 


SysAD[58] 


105. 


SysAD{54] 


143. SCData[114] 


182. SysAD[32] 


28. 


SCData[127] 


66. 


SCData[26] 


106. 


SysAD[22] 


144. IntB[0] 


183. SCData[32] 


29. 


SCTag[17] 


67. 


SysAD[26] 


107. 


Modeln 


145. SCTChk[6] 


184. SCData[65] 


30. 


SCData[95] 


68. 


SCData[122] 


108. 


SCData[22] 


146. SCData[82] 


185. SCAPar[2] 


31. 


SCData[62] 


69. 


IOOut 


109. 


RdRdyB 


147. SCData[49] 


186. SCOEB/IntB[4] 


32. 


SysAD[62] 


70. 


SCTag[23] 


110. 


SCData[118] 


148. SysAD[49] 


187. SCData[97] 


33. 


SCData[30] 


71. 


SCData[90] 


111. 


SCDatat86] 


149. SCDatatl7] 


188. SysAD[l] 


34. 


SysAD[30] 


72. 


SCData[57] 


112. 


. SCData[53] 


150. SysAD[17] 


189. SCI>ata[l] 


35. 


SCData[126] 


73. 


SysAD[57] 


113, 


. SysAD[53] 


151. SCData[113] 


190. SysAD[33] 


36. 


SCTag[18] 


74. 


SCData[25] 


114. 


. SCData[21] 


152. SCAddr[2]IntB[l] 


191. SCData[33] 


37. 


SCData[94] 


75. 


SysAD[25] 


115 


. SysAD[21] 


153. SCAddr[3] 


192. SCData[66] 














i ca erTV»»«roM 




38. 


RCiock[i..0j 


/o. 


OLuaiaii^i) 


11U 


. ov.i^aiaixi /j 


1-fT. UVJL/airMtUl] 




(shi 


ire the same 


77. 


GrpRunB 


117 


. ExtRqstB 


155. SCData[48] 




JTAG bit) 


78. 


SCTag[24] 


118 


. SCTChk[3] 
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193. SCDCSB 

194. SCTCSB/IntB[5] 

195. SCData[98] 

196. SysAD[2J 

197. SCData[2] 

198. SysAD[34] 

199. SCData[34] 

200. SCTag[0] 

201. SCWrBW,X 
(share the same 
JTAG bit) 

202. SCWrBY,Z 
(share the same JTAG 
bit) 

203. SCData[67] 

204. SCTag[l] 

205. SysCmd[0] 

206. SCData[99] 

207. SysAD[3] 

208. SCData[3] 

209. SysAD[35] 

210. SCData(3S] 

211. SCData{68] 

212. SCTag[2] 

213. SysCmd[l] 

214. SCDataflOO] 

215. SysAD[4] 

216. SCData[4] 

217. SysAD[36] 

218. SCData[36] 

219. SCData[69] 

220. SCTag[3] 

221. SysCmd[2] 

222. SCData[101] 

223. SysAD[5] 

224. SCData[5] 

225. SysAD[37] 



226. SCData[37] 

227. SCData[70] 

228. WrRdyB 

229. ModeClock 

230. SCData[102] 

231. SysAD[6] 

232. SCData[6] 

233. SysAD[38] 

234. SCData[38] 

235. SCData[71] 

236. SCTag[4] 

237. SysCmd[3] 

238. SCData[103] 

239. SysAD[7] 

240. SCData[7] 

241. SysAD[39] 

242. SCData[39] 

243. SCDChk[8] 

244. SCTag[5] 

245. SysCmd[41 

246. SCDChk[12] 

247. SysADCfO] 

248. SCDChk[0] 

249. SysADC[4] 

250. SCDChk[4] 

251. SCData[72] 

252. SCTag[6] 

253. SysCmd[5] 

254. SCData[104] 

255. SysAD[8] 

256. SCData[8] 

257. SysAD[40] 

258. SCData[40] 

259. SCData[73] 

260. SCTag[7] 

261. SysCmd[6) 

262. SCData[105] 



263. SysAD[9] 

264. SCData[9] 

265. SysAD[41] 

266. SCData[41] 

267. SCData[74] 

268. SCTag[8] 

269. SysCmdJ.7] 

270. SCData[106] 

271. SysAD[10] 

272. SCData[10] 

273. SysAD[42] 

274. SCData[42] 

275. SCData[75] 

276. SCTag[9] 

277. SysCmd[8] 

278. SCData[107] 

279. SysAD[ll] 

280. SCData[ll] 

281. SysAD[43] 

282. SCData[43] 

283. SCData[76] 

284. SCTag[10] 

285. SysCmdP 

286. SCData[108] 

287. SysAD[12] 

288. SCData[12] 

289. SysAD[44] 

290. SCDataf44J 

291. SCData[77] 

292. SCTagfll] 

293. FaultB 

294. SCData[109] 

295. SysAD[13] 

296. SCData[13] 

297. SysAD[45] 

298. SCData[45] 

299. SCTag[12] 



300. TClock[1..0] 
(share the same JTAG 
bit) 

301. SCData[78] 

302. SCTag[13J 

303. SCData[110] 

304. SysAD[14] 

305. SCData[14] 

306. SysAD[46] 

307. SCData[46] 

308. SCData[79] 

309. SCTag[14] 

310. SCData[lll] 

311. SysADflS] 

312. SCData[15] 

313. SysAD[47] 

314. SCData[47] 

315. SCDChk[9] 

316. SCTag[15] 



317. SCTag_OE (JTAG output enable control for SCTag and SCTChk buses) 

318. SCData_OE (JTAG output enable control for SCData and SCDChk buses) 

319. SysAD.OE (JTAG output enable control for SysAD, SysADC, SysCmd and SysCmdP buses) 



When the TAP controller is in the Reset state, the three most 
significant bits of the Boundary Scan Register are set to "0" (the 
default JTAG output enable control on all the bidirectional pins is to 
disable the outputs). When the TAP controller is in the Capture-DR 
(Boundary Scan) state, the data currently present on all the R4000's 
input and I/O pins are latched into the Boundary Scan Register. The 
Boundary Scan Register bits corresponding to output pins are 
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arbitrary in this state and must not be checked during the scan out 
process. When the TAP controller is in the Shift-DR (Boundary Scan) 
state, data is serially shifted into the Boundary Scan Register from the 
JTDI pin, and the contents of the Boundary Scan Register are shifted 
out onto the JTDO pin. When the TAP controller is in the Update-DR 
(Boundary Scan) state, the current data in the Boundary Scan Register 
is latched into its parallel output latch, and the bits corresponding to 
output pins and those I/O pins whose outputs are enabled (by the 
three MSBs of the Boundary Scan Register) are enabled onto the 
R4000'spins. 

Implementation Specific Details 

• The MasterClock, MasterOut, Syncln, and SyncOut pads do 
not have JTAG. 

• Some pairs of output pads share a single JTAG bit These are: 

SCAddiOW and SCAddrOX 
SCAddrOY and SCAddrOZ 
SCWrBW and SCWrBX 
SCWrBY and SCWrBZ 
TClcoktO] and TClock[l] 
RClocklO] and RClocktl] 

• All input pads data are first latched into a Processor Clock- 
based register in the pad cell before they are captured into the 
Boundary Scan Register in the Capture-DR (Boundary Scan) 
state. When the phase-locked loop is disabled, the processor 
clock is half the frequency of MasterClock. Therefore, the data 
setup required at the input pads is greater than two 
MasterClock periods before the rising edge of the JTCK when 
the TAP controller is in the Capture-DR (Boundary Scan) sate. 

• The output enable controls generated from the three most 
significant bits of the Boundary Scan Register are latched into a 
Processor Clock-based register before they actually enable the 
data onto the pads. Therefore, the delay from the rising edge of 
JTCK in the Update-DR (Boundary Scan) state to data valid at 
the output pins of the chip is greater than two MasterClock 
periods. 
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The R4000 processor supports six hardware interrupts, two software 
interrupts, and a non-maskable interrupt. The processor's six 
hardware interrupts are accessible via external write requests in the 
R4000SC, R4000MC and R4000PC configurations, and by dedicated 
pins as well in the R4000PC configuration. The non-maskable 
interrupt is accessible via external write requests and a dedicated pin 
in the R4000SC, R4000MC and R4000PC configurations. 
External writes to the processor are directed, based on a processor 
internal address map, to various processor internal resources. An 

| external write to any address with SysAD[6..4] = writes to an 

architecturally transparent register called the Interrupt Register. 
During the data cycle, SysAD[22..16] are the write enables for the 6 

| individual Interrupt register bits and SysAD[6..0] are the values to be 

written into these bits.This allows any subset of the Interrupt register 
to set and clear with a single write request. 

• In the R4000SC and R4000MC, bit 5 of the Interrupt register is 
multiplexed with the Timerlnterrupt signal and the result is 
directly readable as bit 15 of the Cause register. Bits 4:1 of the 
Interrupt register are directly readable as bits 14:11 of the Cause 
register. Bit of the Interrupt register is ORed with the lnt*[0] 
pins, and the result is directly readable as bit 10 of the Cause 
register. 

• In the R4000PC, bit 5 of the Interrupt register is ORed with the 
Int*[5] pin and then multiplexed with the Timerlnterrupt signal 
and the result is directly readable as bit 15 of the Cause register. 
Bits 4:0 of the Interrupt register are bit-wise ORed with the 
current value of the interrupt pins Int*[4:0] and the result is 
directly readable as bits 14:10 of the Cause register. 
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In both configurations, bit 6 of the Interrupt Register is ORed 
with the inverted value of the non-maskable interrupt pin NMI* 
to form the non-maskable interrupt input to the processor. 
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The processor provides sixteen check bits for the secondary cache data 
bus SCDChk(15:0), seven check bits for the secondary cache tag bus 
SCTChk(6:0), and eight check bits for the system interface address 
and data bus SysADC(7:0). The sixteen check bits for the secondary 
cache data bus are organized as eight check bits for the upper sixty- 
four bits of the data bus, and eight check bits for the lower sixty-four 
bits of the data bus. In addition, a single check bit is provided for the 
system interface command bus SysCmdP. 

The eight check bits for the system interface address and data bus 
provide either even-byte parity or are generated in accordance with a 
single error correcting double error detecting (SECDED) code that also 
detects any three or four bit error in a nibble. (See Appendix C for 
details.) The eight check bits for each half of the secondary cache data 
bus are always generated in accordance with the SECDED code. 

The processor checks data using parity or the SECDED code as it 
passes from the system interface to the secondary cache and as it is 
moved from the secondary cache to the primary cache or to the system 
interface. The processor passes the check bits for data accessed from 
the secondary cache directly to the system interface without change as 
it checks it The processor does not check data received from the 
system interface for external updates and external writes. It is possible 
to force the processor to not check data from the system interface for 
read responses using a bit in the data identifier. The processor does 
generate correct check bits for double word, word, or partial word 
data transmitted to the system interface. The processor does not check 
addresses received from the system interface, but does generate 
correct check bits for addresses transmitted to the system interface. 
The processor does not contain a data corrector, instead, the processor 
takes a cache error exception when an error is detected based on the 
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data check bits. Software, in conjunction with an off-processor data 
corrector, is responsible for correcting the data when the SECDED 
code is employed. 

The seven check bits for the secondary cache tag bus are generated in 
accordance with a single error correcting double error detecting 
(SECDED) code that also detects any three or four bit error in a nibble. 
The processor generates check bits for the tag when it is written into 
the secondary cache and checks the tag whenever the secondary cache 
is accessed. The processor contains a corrector for the secondary cache 
tag. The tag corrector is not in-line for processor accesses due to 
primary cache misses. When a tag error is detected on a processor 
access due to a primary cache miss, the processor will trap. Software, 
using the processor cache management primitives, corrects the tag. 
When executing the cache management primitives, the processor uses 
the corrected tag to generate write back addresses and cache state. For 
external accesses, the tag corrector is in-line; that is, the response to 
external accesses is based on the corrected tag. The processor still traps 
on tag errors detected during external accesses to allow software to 
repair the contents of the cache if possible. 
The check bit for the system interface command bus provides even 
parity over the nine bits of the system interface command bus. This 
parity bit is generated correctly when the system interface is in master 
state, but is not checked when the system interface is in slave state. 
The busses that are covered by check bits, their contents, and whether 
or not they are checked for various processor internal and external 
transactions are summarized in Table 15-1, Table 15-2, Table 15-3, and 
Table 15-4. 
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Table 15-1 Error Checking and Correcting Summary for Internal Transactions 



Bus 


Secondary 

Cache to 

Primary 

Cache 


Primary 

Cache to 

Secondary 

Cache 


Uncached 
Load 


Uncached 
Store 


Processor or 
Secondary Cache Data 


Checked, Trap 
on Error 


Primary Cache 
parity 


From 

System 

Interface 


Not Checked 


Secondary Cache Data 
Check Bits 


Checked, Trap 
on Error 


Generated 


NA 


NA 


Secondary Cache Tag 
and Check Bits 


Checked, not 
corrected Trap 
on error 


Generated 


NA 


NA 


System Internal Addr/ 
Cmd and Check Bits: 
Transmit 


NA 


NA 


Generated 


Generated 


System Internal Addr/ 
Cmd and Check Bits: 
Receive 


NA 


NA 


Not Checked 


NA 


System Internal Data 


NA 


NA 


Not Checked 


From 
Processor 


System Internal Data 
Check Bits 


NA 


NA 


Not 
Checked 


Generated 
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Table 15-2 Error Checking and Correcting Summary for Internal Transactions 



Bus 


Store to 

Shared 

Cache Line 


Cache 
Instruction 


Secondary 
Cache Load 

from 
System Int 


Secondary 

Cache Write 

to 

System Int 


Processor or 
Secondary 
Cache Data 


Checked on 
read part of 
RMW, Trap on 
Error 


Not 
Checked 


From 
System Int 
unchanged 


Checked, 
Trap on Error 


Secondary 
Cache Data 
Check Bits 


Checked on 
read part of 
RMW, Trap on 
Error 


Not 
Checked 


From 
System Int 
unchanged 


Checked, 
Trap on Error 


Secondary 
Cache Tag & 
Check Bits 


Checked on 
read part of 
RMW, Trap on 
Error 


Checked 
and corrected 


Generated 


Checked, 
not corrected, 
Trap on Error 


System Internal 
Addr/Cmd and 
Check Bits: Transmit 


Generated 


Generated 


Generated 


Generated 1 


System Internal 
Addr/Cmd and 
Check Bits: Receive 


NA 


NA 


Not 
Checked 


NA 


System Internal 
Data 


From 
Processor 


From 
Secondary 


Checked, Trap 
on Error 


From 
Secondary 


System Internal 
Data 
Check Bits 


Generated 


From 

Secondary 

Cache 


Checked, Trap 
on Error 


From 

Secondary 

Cache 
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Table 15-3 Errror Checking and Correcting Summary for External Transactions 



Bus 


Read 
Request 


Write 
Request 


Invalidate 
Request 


Update 
Request 


Processor 
or Secondary 
Cache Data 


NA 


NA 


Not 
Checked 


Checked on 
read part of 
RMW, Trap on 
Error fl] 


Secondary 
Cache Data 
Check Bits 


NA 


NA 


Not 
Checked 


Checked on 

read part of 

RMW, Trap on 

Error tl) , 

Generation 

on write part of 

RMW 

if written 


Secondary 
Cache Tag & 
Check Bits 


NA 


NA 


Checked on 
read part of 
RMW,Trapon 
Error, 

Generation on 
write part of 
RMWif 
written 


Checked on 
read part of 
RMW, Trap on 
Error, 

Generation on 
write part of 
RMW if written 


System Internal Addr/ 
Cmd and Check Bits: 
Transmit 


Generated 


NA 


NA 


NA 


System Internal Addr/ 
Cmd and Check Bits: 
Receive 


Not 
Checked 


Not 
Checked 


Not 
Checked 


Not 
Checked 


System Internal 
Data 


From 
Processor 


Not 
Checked 


Not 
Checked 


Not 
Checked 


System Internal 
Data Check Bits 


Generated 


Not 
Checked 


Not 
Checked 


Not 
Checked 
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Table 154 Error Checking and Correcting Summary for External Transactions 



Bus 


Intervention 

Request Data 

Returned 


Intervention 

Request State 

Returned 


Snoop 
Request 


Processor 
or Secondary 
Cache Data 


Checked, 
Trap on 
Error 


Not Checked 


Not 
Checked 


Secondary 
Cache Data 
Check Bits 


Checked, 
Trap on 
Error 


Not 
Checked 


Not 
Checked 


Secondary Cache Tag 
& Check Bits 


Checked and 
corrected on 
read part of 
RMW, Trap on 
Error, 

Generation on 
write part of 
RMW if 
written. 


Checked and 
corrected on 
read part of 
RMW, Trap on 
Error, 

Generation on 
write part of 
RMW if 
written. 


Checked and 
corrected on 
read part of 
RMW, Trap on 
Error, 

Generation on 
write part of 
RMW if written. 


System Internal 
Addr/Cmd and 
Check Bits: Transmit 


Generated 


Generated 


Generated 


System Internal 
Addr/Cmd and 
Check Bits: Receive 


Not Checked 


Not Checked 


Not Checked 


System Internal 
Data 


From 
Secondary 


NA 


NA 


System Internal 
Data 
Check Bits 


From 

Secondary 

Cache 


NA 


NA 
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Electrical Characteristics 



NOTE: The designer is advised to consult the vendor-specific data 
sheets for the exact information on electrical characteristics. The 
information in this chapter is provided as a reference only. 



Maximum Ratings 



Operation beyond the limits set forth in Table 16-1 may impair the 
useful life of the device. 

Table 16-1 Maximum Ratings 



Parameter 


Symbol 


Min 


Max 


Units 


Supply Voltage 


vcc 


-0.5 


+7.0 


Volts 


Input Voltage 


VIN 


-0.5W 


+7.0 


Volts 


Storage Temperature 


TST 


-65.0 


+150.0 


Degrees C 


Operating Temperature 


TC 





+85.0 


Degrees C 



(1) VIN Min. = -3.0V for pulse width less than 15 ns. 



NOTE: No more than one output should be shorted at a time. Dura- 
tion of the short should not exceed 30 seconds. 
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Operating Range 

Table 16-2 Operating Range 



Range 



Commercial 



Case (TO 



OC to 85C 



5V 



VCC I 

±5% J 



Operating Parameters 

Table 16-3 Operating Parameters 








Parameter 


Symbol 


Conditions 


50MHz 


Units 


Min 


Max 


Output HIGH Voltage 


VOH 


VCC = Min. 


3.5 




V 


Clock Output HIGH 
Voltage 3 


VOHC 


VCC = Min. 


4.0 




V 


Output LOW Voltage 


VOL 


VCC = Min. 




.4 


V 


Input HIGH Voltage 2 


VIH 




2 


VCC+.5 


V 


Input LOW Voltage 1,2 


VIL 




-.5<*> 


.8 


V 


MasterClock Input 












HIGH Voltage 


VIHC 




0.8VCC 


VCC+.5 


V 


MasterClock Input 












LOW Voltage 


VILC 




-.5 (1) 


02VCC 


V 


Input Capacitance 


CIn 






10 


pF 


Output Capacitance 


COut 






10 


pF 


Operating Current 


ICC 


VCC = 5V,TC=0C 




3 


A 


Input Leakage 


ILeak 






10 


uA 


Input/ Output Leakage 


IOLeak 






20 


uA 


Notes: 

(1) VIL Min. = -3.0V for j 

(2) Except for MasterClo 

(3) Applies to TClock, R 


mlse width It 
ck and SyncI 
dock, Maste 


jss than 15 ns. 

n input 

rOut, ModeClock and Syi 


icOut outputs 
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Table 16-4 MasterClock and Clock Parameters 



Parameter 


Symbol 


Test Conditions 


50 MHz 


Units 


Min 


Max 


MasterClock High 


tMCHigh 


Transition < 5ns 


4 




ns 


MasterClock Low 


tMCLow 


Transition < 5ns 


4 




ns 


MasterClock Freq 1 






25 


50 


MHz 


MasterClock Period 


tMCP 




20 


40 


ns 


Clock Jitter 


tMCJitter 






500 


ps 


MasterClock Rise Time 


^MCRise 






5 


ns 


MasterClock Fall Time 


tMCFall 






5 


ns 


ModeClock Period 


tModeCKP 






256»t MCP 


ns 



Notes: 



(1) Operation of the R4000 is only guaranteed with the Phase Lock Loop enabled. 
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System Interface Parameters 

Table 16-5 System Interface Parameters 



Parameter 


Symbol 


Test Conditions 


50MHz 


Units 


Min 


Max 


Data Output w 


*DO 


Maximum Slew Rate 


2 


10 


ns 






Modebits[53:56]=0 












Modebits[57:60]=15 












Minimum Slew Rate 


6 


16 


ns 






Modebits[53:56]=15 












Modebits[57:60]=0 












MC*05 Drive Time 


TBD 


TBD 


ns 






Modebit[50:52]=100 












MC*0.75 Drive Time 


TBD 


TBD 


ns 






Modebit[50:52]=010 












MC*1.0 Drive Time 


TBD 


TBD 


ns 






Modebit[50:52]=001 








Data Setup 


k>s 




5 




ns 


Data Hold 


*dh 




2 




ns 



Notes: 

(1) When the dynamic output slew rate control Mode bits [61] or [62] are enabled, the initial values 
for the pull-up and pull-down rates should be set to the slowest value, Modebits[53:56]=15, 
Modebits[57:60]=0. 

(2) Timings are measured from 1.5V of the clock to 1.5V of signal. 

(3) Capacitive load for all output timings is 50 pf. 

/a\ rw.. rt.rtru.t nota <;<«tiin nnH Data Hold annlv to all loeic signals driven out of or driven into the 
R4000 on the system interface. Secondary cache signals are specified separately. 

NOTE: All output timing specifications given assume 50 pf of capac- 
itive load. Output timing specifications should be derated where ap- 
propriate as shown in Table 16-6 below. 
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Secondary Cache Interface Parameters 

I Table 16-6 Secondary Cache Interface Parameters 



Parameter 


Symbol 


Test Conditions 


50MHz 


Units 


Min 


Max 


MasterClock to Output 1 * 2,3 


tsco 


Maximum Slew Rate 


2 


10 


ns 


Modebits[53:56]=0 




Modebits[57:60]=15 


Minimum Slew Rate 


6 


16 


ns 


Modebits[53:56]=15 




Modebits[57:60]=0 


MC*0.5 Drive Time 


TBD 


TBD 


ns 


Modebit[50:52]=100 




MC*0.75 Drive Time 


TBD 


TBD 


ns 


Modebit[50:52]=010 




MC*1.0 Drive Time 


TBD 


TBD 


ns 


Modebit[50:52]=001 




Data Setup 


tscDS 




5 




ns 


Data Hold 


tsCDH 




2 




ns 


Cycle length of 4-word read 


WlCyc 




4 


15 


cycles 


Cycles between read and write 


tDis 




2 


7 


cycles 


Cycle length of 8-word read 


tR<J2Cyc 




3 


15 


cycles 


Cycles between Address and 

SCWr* 


tWrlDly 




1 


3 


cycles 


Cycles between deassertion of 

SCWr» 

to the start of the next cycle 


tWrRc 







1 


cycles 


Cycles from second 
doubleword to SCWr* 


tWrSUp 




3 


15 


cycles 


Cycles between first and 
second data word in 8-word 
write 


«. 4 




1 


3 


cycles 



Notes: 

(1) When the dynamic output slew rate control Mode bits [61] or [62] 
for the pull-up and pull-down rates should be set to the slowest 
Modebits[57:60]=0. 



are enabled, the initial values 
value, Modebits[53:56]=15, 
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(2) Timings are measured from 1.5V of the clock to 1.5V of signal. 

(3) Capacitive load for all output timings is 50pf. 

(4) Number of cycles is configured through the boot time mode control. Section 9.0 specifies the 
boot time mode interface. 

Capacitive Load Deration 

Table 16-7 Capacitive Load Deration 



Parameter 


Symbol 


50 MHz 


Units 


Min 


Max 


Load Derate 


CLD 




2 


ns/25pF 
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Physical Specifications 



Signal to Pin Correlation of R4000PC 



Table 16-8 lists the PC package signal layout; signals are listed alpha- 
betically left to right and run from the top row on down. 





Table 16-8 R4000 PC Package Layout 






R4000 
Function 


PCPkg 
Pin 


R4000 
Function 


PCPkg 
Pin 


R4000 
Function 


PCPkg 
Pin 


ColdResetB 


T14 


ExtRqstB 


U2 


FaultB 


B16 


On 


T13 


IOOut 


U12 


IntBO 


N2 


IntBl 


L3 


IntB2 


K3 


IntB3 


J3 


IntB4 


H3 


IntB5 


F2 


JTCK 


H17 


JTDI 


G16 


JTDO 


F16 


JIMS 


E16 


MasterClock 


J17 


MasterOut 


P17 


ModeClock 


B4 


Modeln 


U4 


NMIB 


U7 


NoConnect 


U10 


PLLCapO 


*»»* 


PLLCapl 


*»•» 


RClockO 


T17 


RClockl 


R16 


RdRdyB 


T5 


ReleaseB 


V5 


ResetB 


U16 


Syncln 


J16 


SyncOut 


P16 


SysADO 


J2 


SysADl 


G2 


SysAD2 


El 


SysAD3 


E3 


SysAD4 


C2 


SysAD5 


C4 


SysAD6 


B5 


SysAD7 


B6 


SysAD8 


B9 


SysAD9 


Bll 


SysADIO 


C12 


SysADll 


B14 


SysAD12 


B15 


SysAD13 


C16 


SysAD14 


D17 


SysAD15 


E18 


SysAD16 


K2 


SysAD17 


M2 


SysAD18 


PI 


SysAD19 


P3 


SysAD20 


T2 


SysAD21 


T4 


SysAD22 


U5 


SysAD23 


U6 


SysAD24 


U9 


SysAD25 


Ull 


SysAD26 


T12 


SysAD27 


U14 


SysAD28 


U15 


SysAD29 


T16 


SysAD30 


R17 


SysAD31 


M16 


SysAD32 


H2 


SysAD33 


G3 


SysAD34 


F3 


SysAD35 


D2 


SysAD36 


C3 


SysAD37 


B3 


SysAD38 


C6 


SysAD39 


C7 


SysAD40 


CIO 


SysAD41 


Cll 


SysAD42 


B13 


SysAD43 


A15 


SysAD44 


C15 
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R4000 
Function 


PCPkg 
Pin 


R4000 
Function 


PCPkg 
Pin 


R4000 
Function 


PCPkg 
Pin 


SysAD45 


B17 


SysAD46 


E17 


SysAD47 


F17 


SysAD48 


L2 


SysAD49 


M3 


SysAD50 


N3 


SysAD51 


R2 


SysAD52 


T3 


SysAD53 


U3 


SysAD54 


T6 


SysAD55 


T7 


SysAD56 


T10 


SysAD57 


Til 


SysAD58 


U13 


SysAD59 


V15 


SysAD60 


T15 


SysAD61 


U17 


SysAD62 


N16 


SysAD63 


N17 


SysADCO 


C8 


SysADCl 


G17 


SysADC2 


T8 


SysADC3 


L16 


SysADC4 


B8 


SysADCS 


H16 


SysADC6 


U8 


SysADC7 


L17 


SysCmdO 


E2 


SysCmdl 


D3 


SysCmd2 


B2 


SysCmd3 


A5 


SysCmd4 


B7 


SysCmdS 


C9 


SysCmd6 


BIO 


SysCmd7 


B12 


SysCmd8 


C13 


SysCmdP 


C14 


TClockO 


C17 


TClockl 


D16 


VCCOk 


M17 


ValidlnB 


P2 


ValidOutB 


R3 


WrRdyB 


C5 


VccP 


K17 


VssP 


K16 


Vcc 


A2 


Vcc 


A4 


Vcc 


A7 


Vcc 


A9 


Vcc 


All 


Vcc 


A13 


Vcc 


A16 


Vcc 


B18 


Vcc 


CI 


Vcc 


D18 


Vcc 


Fl 


Vcc 


G18 


Vcc 


HI 


Vcc 


J18 


Vcc 


Kl 


Vcc 


L18 


Vcc 


Ml 


Vcc 


N18 


Vcc 


Rl 


Vcc 


T9 


Vcc 


T18 


Vcc 


Ul 


Vcc 


V3 


Vcc 


V6 




V8 


Vcc 


V10 


Vcc 


V12 


Vcc 


V14 


Vcc 


V17 


Vss 


A3 


Vss 


A6 


Vss 


A8 


Vss 


A10 


Vss 


A12 


Vss 


A14 


Vss 


A17 


Vss 


A18 


Vss 


Bl 


Vss 


C18 


Vss 


Dl 


Vss 


F18 


Vss 


Gl 


Vss 


H18 


Vss 


Jl 


Vss 


K18 


Vss 


Ll 


Vss 


M18 


Vss 


Nl 
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R4000 
Function 


PCPkg 
Pin 


R4000 
Function 


PCPkg 
Pin 


R4000 
Function 


PCPkg 
Pin 


Vss 


P18 


Vss 


R18 


Vss 


Tl 


Vss 


U18 


Vss 


VI 


Vss 


V2 


Vss 


V4 


Vss 


V7 


Vss 


V9 


Vss 


Vll 


Vss 


V13 


Vss 


V16 


Vss 


V18 
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Pinout of R4000PC 

Figure 16-1 shows the physical pinout of the R4000PC. 



6 7 8 10 11 12 13 14 15 16 17 18 




3456789 10 11 
Figure 16-1 Pinout of R4O00PC 
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Signal to Pin Correlation of R4000MC/SC 

Table 16-8 lists the PC package signal layout; signals are listed alpha- 
betically left to right and run from the top row on down. 

Table 16-9 R4000MC/SC Package Layout 



R4000 
Function 


MC/SC Pkg 
Pin 


R4000 
Function 


MC/SC Pkg 
Pin 


R4000 
Function 


MC/SC Pkg 
Pin 


ColdResetB 


AW37 


ExtRqstB 


AV2 


FaultB 


C39 


On 


AV32 


IOOut 


AV28 


IntBO 


AL1 


IvdAckB 


AA35 


IvdErrB 


AA39 


JTCK 


U39 


JTDI 


N39 


JTDO 


J39 


JIMS 


G37 


MasterClock 


AA37 


MasterOut 


AJ39 


ModeClock 


B8 


Modeln 


AV8 


NMIB 


AV16 


NoConnect 


AV24 


NoConnect 


Y2 


PLLCapO 


***•* 


PLLCapl 


**#* 


RClockO 


AM34 


RClockl 


AL33 


RdRdyB 


AW7 


ReleaseB 


AV12 


ResetB 


AU39 


SCAParO 


U5 


SCAParl 


Ul 


SCAPar2 


P4 


SCAddrl 


AL5 


SCAddr2 


AG1 


SCAddr3 


AE7 


SCAddr4 


AC1 


SCAddr5 


AC5 


SCAddr6 


AC3 


SCAddr7 


AA1 


SCAddr8 


AB4 


SCAddr9 


AA5 


SCAddrlO 


AA7 


SCAddrll 


AA3 


SCAddrl2 


W3 


SCAddrl3 


Y6 


SCAddrl4 


W5 


SCAddrl5 


W7 


SCAddrl6 


Wl 


SCAddrl7 


U3 


SCAddrOW 


AN7 


SCAddrOX 


AN5 


SCAddrOY 


AM6 


SCAddrOZ 


AL7 


SCDCSB 


M6 


SCDChkO 


G19 


SCDChkl 


T34 


SCDChk2 


AP20 


SCDChk3 


AD34 


SCDChk4 


C19 


SCDChk5 


R37 


SCDChk6 


AU19 


SCDChk7 


AE37 


SCDChk8 


C17 


SCDChk9 


N37 


SCDChklO 


AU17 


SCDChkll 


AG37 


SCDChkl2 


E19 


SCDChkl3 


R35 


SCDChkU 


AR19 


SCDChkl5 


AE35 


SCDataO 


R3 


SCDatal 


R7 


SCData2 


L5 


SCData3 


F8 


SCData4 


C9 


SCData5 


F12 


SCData6 


G15 


SCData7 


E17 


SCData8 


G21 


SCData9 


C25 


SCDatalO 


G25 


SCDatall 


E29 


SCDatal2 


G31 


SCDatal3 


C35 
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R4000 
Function 


MC/SC Pkg 
Pin 


R4000 
Function 


MC/SC Pkg 
Pin 


R4000 
Function 


MC/SC Pkg 
Pin 


SCDatal4 


K36 


SCDatal5 


N35 


SCDatal6 


AE3 


SCDatal7 


AGS 


SCDatal8 


AK4 


SCDatal9 


AN9 


SCData20 


AU9 


SCData21 


AN13 


SCData22 


AT14 


SCData23 


AR17 


SCData24 


AT22 


SCData25 


AU25 


SCData26 


AN27 


SCData27 


AR29 


SCData28 


AN31 


SCData29 


AR35 


SCData30 


AK36 


SCData31 


AG35 


SCData32 


T6 


SCData33 


L3 


SCData34 


L7 


SCData35 


E7 


SCData36 


Gil 


SCData37 


E13 


SCData38 


E15 


SCData39 


G17 


SCData40 


C23 


SCData41 


F24 


SCData42 


E27 


SCData43 


D30 


SCDataM 


C33 


SCData45 


E35 


SCData46 


L35 


SCData47 


R33 


SCData48 


AF4 


SCData49 


AJ3 


SCData50 


AJ7 


SCData51 


AP8 


SCData52 


AT10 


SCData53 


AR13 


SCData54 


AR15 


SCData55 


AT18 


SCData56 


AU23 


SCData57 


AT26 


SCData58 


AR27 


SCData59 


AN29 


SCData60 


AP32 


SCData61 


AN35 


SCData62 


AJ35 


SCData63 


AE33 


SCData64 


V4 


SCData65 


R5 


SCData66 


N5 


SCData67 


E5 


SCData68 


G9 


SCData69 


Ell 


SCData70 


G13 


SCData71 


D14 


SCData72 


C21 


SCData73 


D22 


SCData74 


E25 


SCData75 


G27 


SCData76 


C31 


SCData77 


F32 


SCData78 


J35 


SCData79 


M34 


SCData80 


AC7 


SCData81 


AE5 


SCData82 


AG7 


SCData83 


AR5 


SCData84 


AR9 


SCData85 


AR11 


SCData86 


AN15 


SCData87 


AP16 


SCData88 


AU21 


SCData89 


AN23 


SCData90 


AR25 


SCData91 


AP28 


SCData92 


AU31 


SCData93 


AR33 


SCData94 


AL35 


SCData95 


AH34 


SCData% 


U7 


SCData97 


N3 


SCData98 


N7 


SCData99 


C5 


SCDatalOO 


E9 


SCDatalOl 


Cll 


SCDatal02 


C13 


SCDatal03 


F16 


SCDatal04 


E21 


SCDatal05 


G23 


SCDatal06 


C27 


SCDatal07 


F28 


SCDatal08 


E31 


SCDatal09 


G33 
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R4000 
Function 


MC/SC Pkg 
Pin 


R4000 
Function 


MC/SC Pkg 
Pin 


SCDatallO 


J37 


SCDatalll 


N33 


SCDatall2 


AD6 


SCDatall3 


AG3 


SCDatall4 


AJ5 


SCDatall5 


AU5 


SCDatall6 


AN11 


SCDatall7 


AU11 


SCDatall8 


AU13 


SCDatall9 


AN17 


SCDatal20 


AR21 


SCDatal21 


AP24 


SCDatal22 


AU27 


SCDatal23 


AT30 


SCDatal24 


AU33 


SCDatal25 


AN33 


SCDatal26 


AL37 


SCDatal27 


AG33 


SCOEB 


Nl 


SCTCSB 


Jl 


SCTChkO 


AN21 


SCTChkl 


AN19 


SCTL;hk2 


AU15 


SCTChk3 


AP12 


SCTChk4 


AU7 


SCTChk5 


AR7 


SCTChk6 


AH6 


SCTagO 


K4 


SCTagl 


G7 


SCTag2 


C7 


SCTag3 


D10 


SCTag4 


C15 


SCTag5 


D18 


SCTag6 


F20 


SCTag7 


E23 


SCTag8 


D26 


SCTag9 


C29 


SCTaglO 


G29 


SCTagll 


E33 


SCTagl2 


G35 


SCTagl3 


L33 


SCTagl4 


L37 


SCTagl5 


P36 


SCTagl6 


AF36 


SCTagl7 


AJ37 


SCTagl8 


AJ33 


SCTagl9 


AN37 


SCTag20 


AU35 


SCTag21 


AR31 


SCTag22 


AU29 


SCTag23 


AN25 


SCTag24 


AR23 


SCWrWB 


J5 


SCWrXB 


J7 


SCWrYB 


H6 


SCWrZB 


G5 


StatusO 


U33 


Statusl 


U35 


Status2 


V36 


Status3 


W35 


Status4 


W37 


Status5 


AC37 


Status6 


AC35 


Status7 


AC33 


Syncln 


W39 


SyncOut 


AN39 


SysADO 


T2 


SysADl 


M2 


SysAD2 


J3 


SysAD3 


G3 


SysAD4 


CI 


SysAD5 


A3 


SysAD6 


A9 


SysAD7 


A13 


SysAD8 


A21 


SysAD9 


A25 


SysADIO 


A29 


SysADl 1 


A33 


SysAD12 


B38 


SysAD13 


E37 


SysAD14 


G39 


SysAD15 


L39 


SysAD16 


AD2 


SysADl 7 


AH2 


SysAD18 


AL3 


SysAD19 


AN3 


SysAD20 


AU1 


SysAD21 


AW3 


SysAD22 


AW9 


SysAD23 


AW13 


SysAD24 


AW21 


SysAD25 


AW25 


SysAD26 


AW29 


SvsAD27 


AW33 


SysAD28 


AV38 


SysAD29 


AR37 
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R4000 
Function 


MC/SCPkg 
Pin 


R4000 
Function 


MC/SC Pkg 
Pin 


R4000 
Function 


MC/SCPkg 
Pin 


SvsAD30 


AM38 


SysAD31 


AH38 


SysAD32 


Rl 


SysAD33 


LI 


SysAD34 


H2 


SysAD35 


El 


SysAD36 


C3 


SysAD37 


A5 


SysAD38 


All 


SysAD39 


A15 


SysAD40 


A23 


SysAD41 


A27 


SysAD42 


A31 


SysAD43 


A35 


SysAD44 


C37 


SysAD45 


E39 


SysAD46 


H38 


SysAD47 


M38 


SysAD48 


AE1 


SysAD49 


AJ1 


SysAD50 


AM2 


SysAD51 


AR1 


SysAD52 


AU3 


SysAD53 


AW5 


SysAD54 


AW11 


SysAD55 


AW15 


SysAD56 


AW23 


SysAD57 


AW27 


SysAD58 


AW31 


SysAD59 


AW35 


SysAD60 


AU37 


SysAD61 


AR39 


SysAD62 


AL39 


SysAD63 


AG39 


SysADCO 


A17 


SysADCl 


R39 


SysADC2 


AW17 


SysADC3 


AD38 


SysADC4 


A19 


SysADCS 


T38 


SysADC6 


AW19 


SysADC7 


AC39 


SysCmdO 


Gl 


SysCmdl 


E3 


SysCmd2 


B2 


SysCmd3 


B12 


SysCmd4 


B16 


SysCmd5 


B20 


SysCmd6 


B24 


SysCmd7 


B28 


SysCmd8 


A32 


SysCmdP 


A37 


TClockO 


H34 


TClockl 


J33 


VCCOk 


AE39 


ValidlnB 


AN1 


ValidOutB 


AR3 


WrRdyB 


A7 


VccSense 


W33 


VssSense 


U37 


VccP 


AA33 


VssP 


Y34 


Vcc 


A39 


Vcc 


B6 


Vcc 


BIO 


Vcc 


B18 


Vcc 


B26 


Vcc 


B34 


Vcc 


D4 






Vcc 


D16 


Vcc 


D24 


Vcc 


D32 


Vcc 


D36 


Vcc 


F2 


Vcc 


F14 


Vcc 


F22 


Vcc 


F30 


Vcc 


F38 


Vcc 


H4 


Vcc 


H36 


Vcc 


K6 


Vcc 


K38 


Vcc 


P2 


Vcc 


P34 


Vcc 


T4 


Vcc 


T36 


Vcc 


V6 


Vcc 


V38 


Vcc 


Y38 


Vcc 


AB2 


Vcc 


AB34 


Vcc 


AD4 


Vcc 


AD36 


Vcc 


AF6 


Vcc 


AF38 
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R4000 
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MC/SC Pkg 
Pin 


R4000 
Function 


MC/SC Pkg 
Pin 


Vcc 


AK2 


Vcc 


AK34 


Vcc 


AM4 


Vcc 


AM36 


Vcc 


AP2 


Vcc 


AP10 


Vcc 


AP18 


Vcc 


AP26 


Vcc 


AP38 


Vcc 


AT4 


Vcc 


AT8 


Vcc 


AT16 


Vcc 


AT24 


Vcc 


AT32 


Vcc 


AT36 


Vcc 


AV6 


Vcc 


AV14 


Vcc 


AV20 


Vcc 


AV22 


Vcc 


AV30 


Vcc 


AV34 


Vcc 


AW1 


Vcc 


AW39 


Vss 


B4 


Vss 


B14 


Vss 


B22 


Vss 


B30 


Vss 


B36 


Vss 


D2 


Vss 


D6 


Vss 


D12 


Vss 


D20 


Vss 


D28 


Vss 


D34 


Vss 


D38 


Vss 


F4 


Vss 


F6 


Vss 


F10 


Vss 


F18 


Vss 


F26 


Vss 


F34 


Vss 


F36 


Vss 


K2 


Vss 


K34 


Vss 


M4 


Vss 


M36 


Vss 


P6 


Vss 


P38 


Vss 


V2 


Vss 


V34 


Vss 


Y4 


Vss 


Y36 


Vss 


AB6 


Vss 


AB36 


Vss 


AB38 


Vss 


AF2 


Vss 


AF34 


Vss 


AH4 


Vss 


AH36 


Vss 


AK6 


Vss 


AK38 


Vss 


AP4 


Vss 


AP6 


Vss 


AP14 


Vss 


AP22 


Vss 


AP30 


Vss 


AP34 


Vss 


AP36 


Vss 


AT2 


Vss 


AT6 


Vss 


AT12 


Vss 


AT20 


Vss 


AT28 


Vss 


AT34 


Vss 


AT38 


Vss 


AV4 


Vss 


AV10 


Vss 


AV18 


Vss 


AV26 


Vss 


AV36 
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Pinout of R4000MC/SC Package 

Figure 16-2 shows the physical pinout of the R4000MC and SC. 
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Figure 16-2 Pinout of the R4000MC and R4000SC 
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A 



This appendix provides a detailed description of the operation of each 
R4000 instruction in both 32- and 64-bit modes. The instructions are 
listed in alphabetical order. 
Refer to Appendix B for a detailed description of the FPU instructions. 

The exceptions that may occur due to the execution of each instruction 
are listed after the description of each instruction. The description of 
the immediate causes and manner of handling exceptions is omitted 
from the instruction descriptions in this chapter. Refer to Chapter 5 for 
detailed descriptions of exceptions and handling. 
Figures at the end of this appendix list the bit encoding for the con- 
stant fields of each instruction, and the bit encoding for each individ- 
ual instruction is included with that instruction. 
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Instruction Classes 

CPU instructions are divided into the following classes: 

• Load and Store instructions move data between memory 
and general registers. They are all I-type instructions, since 
the only addressing mode supported is base register + 1 6-bit 
immediate offset. 

• Computational instructions perform arithmetic, logical 
and shift operations on values in registers. They occur in 
both R-type (both operands are registers) and I-type (one 
operand is a 16-bit immediate) formats. 

• Jump and Branch instructions change the control flow of a 
program. Jumps are always to absolute 26-bit word 
addresses (J-type format), or register addresses (R-type, for 
returns and dispatches). Branches have 16-bit offsets 
relative to the program counter (I-type). Jump and Link 
instructions save a return address in Register 31. 

• Coprocessor instructions perform operations in the 
coprocessors. Coprocessor loads and stores are I-type. 
Coprocessor computational instructions have coprocessor- 
dependent formats (see the FPU instructions). Coprocessor 
zero (CPO) instructions manipulate the memory 
management and exception handling facilities of the 
processor. 

• Special instructions perform a variety of tasks, including 
movement of data between special and general registers, 
trap, and breakpoint. They are always R-type. 
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Instruction Formats 



Every CPU instruction consists of a single word (32 bits) aligned on a 
word boundary and the major instruction formats are shown in Figure 
A-l. 



I-Type (Immediate) 

31 26 25 21 20 16 15 



op 


rs 


rt 


immediate 



J-Type (Jump) 
31 26 25 



op 



target 



R-Type (Register) 

31 26 25 21 20 16 15 



1110 6 5 



op 


rs 


rt 


rd 


shamt 


funct 



where: 



op 


is a 6-bit operation code 


rs 


is a 5-bit source register specifier 


rt 


is a 5-bit target (source/destination) 
register or branch condition 


immediate 


is a 16-bit immediate, branch dis- 
placement or address displacement 


target 


is a 26-bit jump target address 


rd 


is a 5-bit destination register specifier 


shamt 


is a 5-bit shift amount 


funct 


is a 6-bit function field 



Figure A- J CPU Instruction Formats 
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Instruction Notation Conventions 



In this appendix, all variable subfields in an instruction format (such 
as rs, rt, immediate, etc.) are shown in lowercase names. 
For the sake of clarity, we sometimes use an alias for a variable sub- 
field in the formats of specific instructions. For example, we use rs = 
base in the format for load and store instructions. Such an alias is al- 
ways lower case, since it refers to a variable subfield. 
Figures with the actual bit encoding for all the mnemonics are located 
at the end of this Appendix, and the bit encoding also accompanies 
each instruction. 

In the instruction descriptions that follow, the Operation section de- 
scribes the operation performed by each instruction using a high-level 
language notation. The R4000 can operate as either a 32- or 64-bit mi- 
croprocessor. The operation for both modes is included with the in- 
struction description. Special symbols used in the notation are de- 
scribed in Table A-l. 
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Table A-l CPU Instruction Operation Notations 



Symbol 


Meaning 


<— 


Assignment. 


|| 


Bit string concatenation. 


x> 


Replication of bit value x into a y-bit string. Note: x is always a single-bit value. 


x y..z 


Selection of bits y through zof bit string x. Little-endian bit notation is always 
used. If y is less than z, this expression is an empty (zero length) bit string. 


+ 


Two's complement or floating-point addition. 


- 


Two's complement or floating-point subtraction. 


. 


Two's complement or floating-point multiplication. 


div 


Two's complement integer division. 


mod 


Two's complement modulo. 


/ 


Floating-point division. 


< 


Two's complement less than comparison. 


and 


Bitwise logic AND. 


or 


Bitwise logic OR. 


xor 


Bitwise logic XOR. 


nor 


Bitwise logic NOR. 


GPR[x] 


General-Register x. The content of GPR[0] is always zero. Attempts to alter the 
content of GPR[0] have no effect. 


CPR[z,x] 


Coprocessor unit z, general register x. 


CCR[z,x] 


Coprocessor unit z, control register x. 


COC[z] 


Coprocessor unit z condition signal. 


BigEndianMem 


Big-endian mode as configured at reset (0 — > Little, 1 — > Big). Specifies the 
endianess of the memory interface (see LoadMemory and StoreMemory), and 
the endianess of Kernel and Supervisor mode execution. 


ReverseEndian 


Signal to reverse the endianess of load and store instructions. This feature is 
available in User mode only, and is effected by setting the RE bit of the Status 
register. Thus, ReverseEndian may be computed as (SR25 and User mode). 


BigEndianCPU 


The endianess for load and store instructions (0 — » Little, 1 -» Big). In User 
mode, this endianess may be reversed by setting SR 2 s- Thus, BigEndianCPU 
may be computed as BigEndianMem XOR ReverseEndian. 


LLbit 


Bit of state to specify synchronization instructions. Set by LL, cleared by ERET 
and Invalidate and read by SO 


T+r. 


Indicates the time steps between operations. Each of the statements within a 
time step are defined to be executed in sequential order (as modified by con- 
ditional and loop constructs). Operations which are marked T+i: are executed 
at instruction cycle / relative to the start of execution of the instruction. Thus, 
an instruction which starts at time /executes operations marked T+/; at time 
/' ■*■ / The interpretation of the order of execution between two instructions or 
two operations which execute at the same time should be pessimistic; the or- 
der is not defined. 
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Instruction Notation Examples 



The following examples illustrate the application of some of the in- 
struction notation conventions: 



Example #1: 



GPR[rt] <- immediate || 16 



Sixteen zero bits are concatenated with an immediate 
value (typically 16 bits), and the 32-bit string (with the lower 
16 bits set to zero) is assigned to General-Purpose Register 
it. 



Example #2: 



(immediate 15 ) 16 || immediate 15 



Bit 15 (the sign bit) of an immediate value is extended for 
1 6 bit positions, and the result is concatenated with bits 1 5 
through of the immediate value to form a 32-bit sign 
extended value. 
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Load and Store Instructions 



In the R4000 implementation, the instruction immediately following a 
load may use the contents of the register loaded. In such cases, the 
hardware interlocks, requiring additional real cycles, so scheduling 
load delay slots is still desirable, although not required for functional 
code. 

Two special instructions are provided in the R4000 implementation of 
the MIPS ISA, Load Linked and Store Conditional. These instructions 
are used in carefully coded sequences to provide one of several syn- 
chronization primitives, including test-and-set, bit-level locks, sema- 
phores, and sequencers/event counts. 

In the load and store operation descriptions, the functions listed in Ta- 
ble A-2 are used to summarize the handling of virtual addresses and 
physical memory. 
Table A-2 Load and Store Common Functions 



Function 


Meaning 


AddressTranslation 
LoadMemory 

StoreMemory 


Uses the TLB to find the physical address given the virtual ad- 
dress. The function fails and an exception is taken if the re- 
quired translation is not present in the TLB. 

Uses the cache and main memory to find the contents of the 
word containing the specified physical address. The low-order 
two bits of the address and the access type field indicates which 
of each of the four bytes within the data word need to be re- 
turned. If the cache is enabled for this access, the entire word 
is returned and loaded into the cache. 

Uses the cache, write buffer, and main memory to store the 
word or part of word specified as data in the word containing the 
specified physical address. The low-order two bits of the ad- 
dress and the access type field indicates which of each of the 
four bytes within the data word should be stored. 



The access type field indicates the size of the data item to be loaded or 
stored as shown in Table A-3. Regardless of access type or byte-num- 
bering order (endianness), the address specifies the byte which has the 
smallest byte address of the bytes in the addressed field. For a Big-en- 
dian machine, this is the leftmost byte and contains the sign for a 2's- 
complement number; for a Little-endian machine, this is the rightmost 
byte and contains the lowest precision byte. 
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Table A-3 Access Type Specifications for Loads! Stores 



Access type 






Mnemonic 


Value 


Meaning 


DOUBLEWORD 


7 


doubleword (64 bits) 


SEPTIBYTE 


6 


seven bytes (56 bits) 


SEXTIBYTE 


5 


six bytes (48 bits) 


QUINTIBYTE 


4 


five bytes (40 bits) 


WORD 


3 


word (32 bits) 


TRIPLEBYTE 


2 


triple-byte (24 bits) 


HALFWORD 


1 


halfword(16bits) 


BYTE 





byte (8 bits) 



The bytes within the addressed doubleword which are used can be de- 
termined directly from the access type and the three low-order bits of 
the address, as shown in Chapter 3. 



Jump and Branch Instructions 



All jump and branch instructions have an architectural delay of exact- 
ly one instruction. That is, the instruction immediately following a 
jump or branch (i.e., occupying the delay slot) is always executed 
while the target instruction is being fetched from storage. It is not val- 
id for a delay slot to be occupied itself by a jump or branch instruction; 
however, this error is not detected, and the results of such an opera- 
tion are undefined. 

If an exception or interrupt prevents the completion of a legal instruc- 
tion during a delay slot the hardware sets the EPC register to point at 
the jump or branch instruction which precedes it. When the code is re- 
started, both the jump or branch instructions and the instruction in the 
delay slot are reexecuted. 
Because jump and branch instructions may be restarted after excep- 

.*_.__ __: — i. j.~ j.1 — .. -»..<-!. W,> »^c*-'**-*"T 1 K1,o Tl-ioT-ofr\ro urhpn a tirmn 

ClUlto Ur lllLtrilUpiS, UlC}' 111U31 L/C itJloimun.. i..>.,^.v.w .......... j r 

or branch instruction stores a return link value, register 31 (the register 
in which the link is stored) may not be used as a source register. 
Since instructions must be word-aligned, a Jump Register or Jump and 
Link Register instruction must use a register whose two low-order bits 
are zero. If these low-order bits are not zero, an address exception will 
occur when the jump target instruction is subsequently fetched. 



AS 
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Coprocessor Instructions 

The MIPS architecture provides four coprocessor units, or classes. Co- 
processors are alternate execution units, which have separate register 
files from the CPU. R-Series coprocessors have 2 register spaces, each 
with thirty-two 32-bit registers. The first space, coprocessor general reg- 
isters, may be directly loaded from memory and stored into memory, 
and their contents may be transferred between the coprocessor and 
processor. The second, coprocessor control registers, may only have 
their contents transferred directly between the coprocessor and pro- 
cessor. Coprocessor instructions may alter registers in either space. 

Normally, by convention, Coprocessor Control Register is interpreted 
as a Coprocessor Implementation And Revision register. However, the 
system control coprocessor (CPO) uses Coprocessor General Register 15 
for the processor/ coprocessor revision register. The register's low-or- 
der byte (bits 7..0) is interpreted as a coprocessor unit revision num- 
ber. The second byte (bits 15..8) is interpreted as a coprocessor unit im- 
plementation descriptor. The revision number is a value of the form y. 
x where y is a major revision number in bits 7..4 and x is a minor revi- 
sion number in bits 3..0. 

The contents of the high-order halfword of the register are not defined 
(currently read as and should be when written). 

System Control Coprocessor (CPO) Instructions 

There are some special limitations imposed on operations involving 
CPO that is incorporated within the CPU. Although load and store in- 
structions to transfer data to and from coprocessors and move control 
to/from coprocessor instructions are generally permitted by the MIPS 
architecture, CPO is given a somewhat protected status since it has re- 
sponsibility for exception handling and memory management. There- 
fore, the move to/ from coprocessor instructions are the only valid 
mechanism for reading from and writing to the CPO registers. 
Several coprocessor operation instructions are defined for CPO to di- 
rectly read, write, and probe TLB entries and to modify the operating 
modes in preparation for returning to User mode or interrupt-enabled 
states. 
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ADD 



Add 



ADD 





31 26 25 21 20 


16 15 


11 10 6 5 




SPECIAL 
000000 


rs 


rt 


rd 



00000 


ADD 
1 00000 




6 5 5 


5 


5 6 




Format: 


ADD rd, rs,rt 













Description: 

The contents of general register rs and the contents of general register 
rt are added to form the result. The result is placed into general regis- 
ter rd. In 64-bit mode, the operands must be valid sign-extended, 32- 
bit values. 

An overflow exception occurs if the carries out of bits 30 and 31 differ 
(2's-complement overflow). The destination register rd is not modi- 
fied when an integer overflow exception occurs. 

Operation: 



32 T: GPR[rd] <-GPR[rs] + GPR[rt] 

64 T: temp <- GPR[rs] + GPR[rt] 

GPR[rd] «- (temp 31 ) 32 || temp 31 



Exceptions: 

Integer overflow exception 
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Add Immediate 



ADDI 





31 26 25 




21 20 


16 15 











ADDI 
001000 


rs 


rt 


immediate 






6 


5 


5 




16 







Format: 



ADDI rt, rs, immediate 



Description: 

The 16-bit immediate is sign-extended and added to the contents of 
general register rs to form the result. The result is placed into general 
register rt. In 64-bit mode, the operand must be valid sign-extended, 
32-bit values. 

An overflow exception occurs if carries out of bits 30 and 31 differ (2's- 
complement overflow). The destination register rt is not modified 
when an integer overflow exception occurs. 

Operation: 



32 


T: 


GPR [rt] <- GPR[rs] +(immediate 15 ) 16 1| immediate 15 


64 


T: 


temp «- GPR[rs] + (immediate 15 ) 48 || immediate 15 _.o 
GPR[rt] «- (temp 31 ) 32 || temp 31 .. 



Exceptions: 

Integer overflow exception 
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ADDIU 



Add Immediate Unsigned 



ADDIU 





31 26 25 




21 20 


16 15 









ADDIU 
00 1001 


rs 


rt 


immediate 




6 


5 


5 




16 





Format: 

ADDIU rt, rs, immediate 

Description: 

The 16-bit immediate is sign-extended and added to the contents of 
general register rs to form the result. The result is placed into general 
register rt. No integer overflow exception occurs under any circum- 
stances. In 64-bit mode, the operand must be valid sign-extended, 32- 
bit values. 

The only difference between this instruction and the ADDI instruction 
is that ADDIU never causes an overflow exception. 

Operation: 



32 T: GPR[rt]<-GPR[rs] + (immediate l5 ) 16 ||immediate 15 ..o 

64 T: temp <- GPR[rs] + (im mediate! 5 ) 48 || immediate l5 .. 
GPR[rt] «- (temp 31 ) 32 ||temp 31 .. 



Exceptions: 

None 
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ADDU 



Add Unsigned 



ADDU 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 


rs 


rt | rd 



00000 


ADDU 
100001 



Format: 



ADDU rd, rs, rt 



Description: 

The contents of general register rs and the contents of general register 
rt are added to form the result. The result is placed into general regis- 
ter rd. No overflow exception occurs under any circumstances. In 64- 
bit mode, the operands must be valid sign-extended, 32-bit values. 

The only difference between this instruction and the ADD instruction 
is that ADDU never causes an overflow exception. 

Operation: 



32 


T: 


GPR[rd] <-GPR[rs] + GPR[rt] 


64 


T: 


temp <r- GPR[rs] + GPR[rt] 
GPR[rd] ^- (temp 31 ) 32 || temp 3l .. 



Exceptions: 

None 
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AND 



31 



26 25 



21 20 



And 



16 15 11 10 



AND 



6 5 



SPECIAL 
000000 



rs 



rd 




00000 



AND 
100 100 



Format: 

AND rd, rs, rt 
Description: 

The contents of general register rs are combined with the contents of 
general register rt in a bit-wise logical AND operation. The result is 
placed into general register rd. 
Operation: 



32 T: GPR[rd] «- GPR[rs] and GPR(rt] 



64 T: GPR[rd] <- GPR[rs] and GPR[rt] 



Exceptions: 

None 
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ANDI 



And Immediate 



ANDI 





31 26 25 




21 20 


16 15 











ANDI 
00 1100 


rs 


rt 


immediate 






6 


5 


5 




16 







Format: 

ANDI rt, rs, immediate 
Description: 

The 16-bit immediate is zero-extended and combined with the contents 
of general register rs in a bit-wise logical AND operation. The result is 
placed into general register rt. 

Operation: 



32 


T: 


GPR[rt] <r- 16 1| (immediate and GPR[rs] 15 . ) 


64 


T: 


GPR[rt] <- 48 1| (immediate and GPR[rs] 15 . ) 



Exceptions: 

None 
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31 26 


25 21 


20 16 15 







COPz 
01 x x* 


BC 
01000 


BCF 
00000 




offset 




6 


5 


5 




16 





Format: 

BCzF offset 
Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two 
bits and sign-extended. If coprocessor z's condition signal (CpCond), 
as sampled during the previous instruction, is false, then the program 
branches to the target address with a delay of one instruction. 
Because the condition line is sampled during the previous instruction, 
there must be at least one instruction between this instruction and a 
coprocessor instruction that changes the condition line. 

Operation: 



32 T-1 : condition «- not COC[z] 

T: target «- (offset! 5 ) 1 4 1 1 offset 1 1 2 
T+1 : if condition then 

PC *- PC + target 
endif 

64 T-1 : condition <- not COC[z] 

T: target «- (offset! 5 ) 38 1 1 offset || 2 
T+1: if condition then 

PC <- PC + target 
endif 



*See the table "Opcode Bit Encoding" on next page, or "CPU Instruc- 
tion Opcode Bit Encoding" at the end of Appendix A. 



^./(j R4000 User's Manual-Preliminary 



CPU Instruction Set Details 



OAv^P Branch On Coprocessor z False ROtF 



Exceptions: 

Coprocessor unusable exception 
Opcode Bit Encoding: 



BCzF Bit * 

BCOF 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 










1 

















1 






























Bit# 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 







BC1F 





1 











1 





1 






























Bit# 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 







BC2F 





1 








1 


o jo 


1 

i 



























Bit# 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 







BC3F 





1 





1 


1 





1 




































^ J 


' v ' 

Branch condition 






Ope 


:ode 




BC sub-opcode 
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BCzFL Branch 0n Coprocessor z False Likely BCzFL 



31 26 


25 21 


20 16 15 







COPz 
1 x x* 


BC 
1000 


BCFL 
00010 


offset 


R 


5 


5 




16 





Format: 

BCzFL offset 
Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two 
bits and sign-extended. If the contents of coprocessor z's condition 
line, as sampled during the previous instruction, is false, the target ad- 
dress is branched to with a delay of one instruction. 
If the conditional branch is not taken, the instruction in the branch de- 
lay slot is nullified. 

Because the condition line is sampled during the previous instruction, 
there must be at least one instruction between this instruction and a 
coprocessor instruction that changes the condition line. 



*See the table "Opcode Bit Encoding" on next page, or "CPU Instruc- 
tion Opcode Bit Encoding" at the end of Appendix A. 
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Df*^ CI Branch On Coprocessor z False Likely RCtFL 

■^v^fci "™ (continued) 



Operation: 



32 



64 



T-1: condition <- not COC[z] 

T: target <- (offset, 5 ) 1 4 1 1 offset 1 1 2 

T+1: if condition then 

PC <- PC + target 
else 

NullifyCurrentlnstruction 

endif 

T-1: condition <- not COC[z] 

T: target <- (offset, 5 ) 38 1 1 offset 1 1 2 

T+1: if condition then 

PC <- PC + target 
else 

NullifyCurrentlnstruction 

endif 



Exceptions: 

Coprocessor unusable exception 
Opcode Bit Encoding: 



BCzFL Bit # 31 30 29 28 27 26 25 24 2322 21 20 19 18 17 16 



BCOFL 







Bit# 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 



BC1FL 



Bjt# 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 



BC2FL 



1 



Bit# 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 



BC3FL 



1 



v — 

Opcode 
Coprocessor Unit Number 




^T 



^V_ 



"V" 



BC sub-opcode Branch condition 
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R Q2T Branch On Coprocessor z True q CZi 



31 26 25 21 20 16 15 



COPz 
1 x x* 



BC 
01000 



BCT 
00001 



offset 



16 



Format: 

BCzT offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two 
bits and sign-extended. If the coprocessor z's condition signal (Cp- 
Cond) is true, then the program branches to the target address, with a 
delay of one instruction. 

Because the condition line is sampled during the previous instruction, 
there must be at least one instruction between this instruction and a 
coprocessor instruction that changes the condition line. 

Operation: 



32 T-1: condition *- COC[z] 

T: target f- (offset 15 ) 14 || offset || 2 

T+1: if condition then 

PC <- PC + target 
endif 
64 T-1 : condition <- COC[z] 

T: target «- (offset 15 ) 38 || offset || 2 

T+1: if condition then 

PP. *_ PC + target 
endif 



•See the table "Opcode Bit Encoding" on next page, or "CPU Instruc- 
tion Opcode Bit Encoding" at the end of Appendix A. 
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Branch On Coprocessor z True 
(continued) 



BCzT 



Exceptions: 

Coprocessor unusable exception 
Opcode Bit Encoding: 



BCzT Brt * 

BCOT 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 










1 

















1 























1 






Bit* 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 







BC1T 





1 











1 





1 























1 






Bit # 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 







BC2T 





1 








1 








1 























1 






Bit # 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 







BC3T 





1 








1 


1 





1 























1 








^ J\ 






V 
>Of 


s 


V J 






Coprocessor 


( 
Unit 


Dpc 
Ni 


ode 
mb 


3T- 






BCsul 


xxxJe 


Bra 


met 


V 
1 CO 


ndition 
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BOzTL Branch 0n Coprocessor z True Likely BCZ I L 



31 26 25 21 20 16 15 



COPz 
1 x x* 



BC 
01000 



BCTL 
000 11 



offset 



16 



Format: 

BCzTL offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two 
bits and sign-extended. If the contents of coprocessor s's condition 
line, as sampled during the previous instruction, is true, the target ad- 
dress is branched to with a delay of one instruction. 
If the conditional branch is not taken, the instruction in the branch de- 
lay slot is nullified. 

Because the condition line is sampled during the previous instruction, 
there must be at least one instruction between this instruction and a 
coprocessor instruction that changes the condition line. 

Operation: 



32 T-1 : condition <- COC[z] 

T: target «- (offset 15 ) 14 || offset || 2 
T+1: if condition then 

PC <- PC + target 
NullifyCurrentlnstruction 
endif 
64 T-1: condition <- COC[z] 

T: target <- (offset, 5 ) 38 ! | offset 1 1 2 
T+1: if condition then 

PC 4- PC + target 

else 

NullifyCurrentlnstruction 

endif 



"See the table "Opcode Bit Encoding" on next page, or "CPU Instruc- 
tion Opcode Bit Encoding" at the end of Appendix A. 
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(continued) 



Exceptions: 

Coprocessor unusable exception 

Opcode Bit Encoding: 



BCzTL Rtt# 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 



BCOTL 
Bit# 







1 











1 







1 



1 



31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 



BC1TL 
Bit 



1 







1 



1 











1 



# 31 50 2 9 28 27 26 25 24 23 22 21 20 19 18 17 16 



BC2TL 











Opcode 
Coprocessor Unit Number - 



BC sub-opcode Branch condition 



Bit# 


31 30 29 28 27 


26 25 24 


23 22 


21 


20 19 18 


17 16 





BC3TL 





1 








1 


1 





1 




















1 


1 






^ 










^ j 


kz 








J 
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BEQ 



Branch On Equal 



BEQ 



31 



26 25 



21 20 



16 15 



BEQ 
000100 



rs 



offset 



16 



Format: 

BEQ is, it, offset 
Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two 
bits and sign-extended. The contents of general register rs and the con- 
tents of general register rt are compared. If the two registers are equal, 
then the program branches to the target address, with a delay of one 
instruction. 

Operation: 



32 T: 



T+1: 



64 



T: 



T+1: 



target «- (offset! 5 ) 14 || offset || 2 
condition «- (GPR[rs] = GPR[rt]) 
if condition then 

PC e- PC + target 
endif 

target «- (offset 15 ) 38 || offset || O 2 
condition *- (GPR[rs] = GPR[rt]) 



if condition then 
PC f- PC 

endif 



target 



Exceptions: 

None 
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Branch On Equal Likely 



BEQL 





31 26 25 




21 20 


16 15 











BEQL 
010100 


rs 


rt 


offset 






6 


5 


5 




16 







Format: 

BEQL rs, rt, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two 
bits and sign-extended. The contents of general register rs and the con- 
tents of general register rt are compared. If the two registers are equal, 
the target address is branched to, with a delay of one instruction. If the 
conditional branch is not taken, the instruction in the branch delay slot 
is nullified. 



Operation: 




32 


T: target «- (offset! 5 ) 1 4 1 1 offset 1 1 2 




condition <- (GPR[rs] = GPR[rt]) 




T+1: if condition then 




PC <- PC + target 
else 




NullifyCurrentlnstruction 




endif 


64 


T: target <-(offset 15 ) 38 || offset || 2 




condition <- (GPR[rs] = GPR[rt]) 




T+1 : if condition then 




PC <- PC + target 




else 




NullifyCurrentlnstruction 




endif 



Exceptions: 

None 
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Branch On Greater Than 
Or Equal To Zero 



31 26 


25 




21 


20 16 


15 







REGIMM 
00000 1 


rs 


BGEZ 
00001 


offset 


ft 




F> 




5 




16 





Format: 



BGEZ rs, offset 



Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two 
bits and sign-extended. If the contents of general register rs have the 
sign bit cleared, then the program branches to the target address, with 
a delay of one instruction. 
Operation: 



32 T: target «- (offset 5 ) 14 || offset 
condition *- (GPR[rs] 31 = 0) 
T+1: if condition then 

PC <- PC + target 
endif 

64 T: target*- (offset^) 

condition «- (GPR[rs] 63 = 0) 
T+1: if condition then 

PC «- PC + target 
endif 



38 "offset || O 2 



exceptions: 

None 
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BGEZAL 





31 26 25 


21 20 16 15 











REGIMM 
00000 1 


rs 


BGEZAL 
1 0001 


offset 






6 


5 5 


16 







Format: 



BGEZAL rs, offset 



Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two 
bits and sign-extended. Unconditionally, the address of the instruc- 
tion after the delay slot is placed in the link register, r31 . If the contents 
of general register rs have the sign bit cleared, then the program 
branches to the target address, with a delay of one instruction. 

General register rs may not be general register 32, because such an in- 
struction is not restartable. An attempt to execute this instruction is not 
trapped, however. 

Operation: 



32 


T: target «- (offset! 5 ) 1 4 1 1 off set 1 1 2 
condition <- (GPR[rs] 31 = 0) 
GPR[31]<-PC + 8 

T+1: if condition then 




PC <- PC + target 
endif 


64 


T: target «- (offset 15 ) 38 1| offset || 2 
condition <- (GPR[rs] 63 = 0) 
GPR[31] t-PC+8 

T+1: if condition then 

PC <- PC + target 
endif 



Exceptions: 

None 



R4000 User's Manual-Preliminary 



A-27 



Appendix A 



BGEZALL 



Branch On Greater Than 

Or Equal To Zero 

And Link Likely 



BGEZALL 





31 26 25 21 20 16 15 











REGIMM 
000001 


rs 


BGEZALL 
10011 


offset 






6 5 5 


16 







Format: 

BGEZALL rs, offset 
Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two 
bits and sign-extended. Unconditionally, the address of the instruc- 
tion after the delay slot is placed in the link register, r31. If the contents 
of general register rs have the sign bit cleared, then the program 
branches to the target address, with a delay of one instruction. 
General register rs may not be general register 31 , because such an in- 
struction is not restartable. An attempt to execute this instruction is not 
trapped, however. If the conditional branch is not taken, the instruc- 
tion in the branch delay slot is nullified. 

Operation: 



32 T: target <- (offset, 5 ) 1 4 1 1 offset 1 1 0' 



T+1: 



64 T: 



condition «- (GPR[rs] 31 = 0) 

GPR[31] «- PC + 8 

if condition then 

PC <- PC + target 

else 

NullifyCurrentlnstruction 

endif 

target «- (offset, 5 ) 38 || offset || 

condition <- (GPR[rs] 63 = 0) 

GPR[31] «- PC + 8 

T+1: if condition then 

PC <- PC + target 

else 

NullifyCurrentlnstruction 

endif 



Exceptions: 

None 
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BGEZL 



31 



26 25 



21 20 



16 15 



REGIMM 
000001 


rs 


BGEZL 

00011 


offset 



16 



Format: 

BGEZL rs, offset 
Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two 
bits and sign-extended. If the contents of general register rs have the 
sign bit cleared, then the program branches to the target address, with 
a delay of one instruction. If the conditional branch is not taken, the in- 
struction in the branch delay slot is nullified. 

Operation: 



32 


T: target «- (offset 15 ) 14 || offset || 2 




condition <- (GPR[rs] 31 = 0) 




T+1: if condition then 




PC <- PC + target 




NullifyCurrentlnstruction 




endif 


64 


T: target «- (offset 15 ) 38 || offset || 2 




condition <- (GPR[rs] 63 = 0) 




T+1: if condition then 




PC <- PC + target 




else 




NullifyCurrentlnstruction 




endif 



Exceptions: 

None 
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BGTZ 



Branch On Greater Than Zero 



BGTZ 



31 



26 25 



21 20 



16 15 



BGTZ 
000111 



rs 




00000 



offset 



6 



16 



Format: 

BGTZ rs, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two 
bits and sign-extended. The contents of general register rs are com- 
pared to zero. If the contents of general register rs have the sign bit 
cleared and are not equal to zero, then the program branches to the 
target address, with a delay of one instruction. 

Operation: 



32 T: target <- (offset! 5 ) 14 || offset || 2 

condition <- (GPR[rs] 31 = 0) and (GPR[rs] * 32 ) 

T+1 : if condition then 

PC <- PC + target 
endif 
64 T: target <- (offset, 5 ) 38 1 1 offset 1 1 2 

condition «- (GPR[rs] 63 = 0) and (GPR[rs] * 64 ) 

T+1 : if condition then 

PC <- PC + target 
endif 



Exceptions: 

None 
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BGTZL 



31 



26 25 



21 20 



16 15 



BGTZL 
010111 



rs 




00000 



offset 



16 



Format: 



BGTZL rs, offset 



Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two 
bits and sign-extended. The contents of general register rs are com- 
pared to zero. If the contents of general register rs have the sign bit 
cleared and are not equal to zero, then the program branches to the 
target address, with a delay of one instruction. If the conditional 
branch is nor taken, the instruction in the branch delay slot is nullified. 

Operation: 



32 


T: target <-(offset 15 ) 14 || offset || 2 

condition *- (GPR[rs] 31 = 0) and (GPR[rs] * 32 ) 
T+1: if condition then 

PC «- PC + target 




NullifyCurrentlnstruction 
endif 


64 


T: target «- (offset 5 ) 38 1 1 offset 1 1 2 

condition *- (GPR[rs] 63 - 0) and (GPR[rs] * 64 ) 
T+1: if condition then 

PC <- PC + target 
else 

NullifyCurrentlnstruction 

endif 



Exceptions: 

None 
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31 



26 25 



BLEZ 
000110 



Branch on Less Than 
Or Equal To Zero 



21 20 



16 15 



rs 




00000 



offset 



16 



Format: 



BLEZ rs, offset 



Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two 
bits and sign-extended. The contents of general register rs are com- 
pared to zero. If the contents of general register rs have the sign bit set, 
or are equal to zero, then the program branches to the target address, 
with a delay of one instruction. 
Operation: 



32 



64 



T: target f-(offset 15 ) 14 || offset HO 2 

condition «- (GPR[rs] 31 = 1) or (GPR[rs] = 32 ) 
T+1: if condition then 

PC <- PC + target 

endif 

T: target «- (offset! s) 38 1 1 offset 1 1 2 

condition t- (GPR[rs] 63 = 1) and (GPR[rs] = 64 ) 
T+1: if condition then 

PC <r- PC + target 

endif 



None 
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BLEZL 





31 26 25 




21 20 16 15 











BLEZL 

010 110 


rs 



00000 


offset 






6 


5 


5 


16 







Format: 

BLEZL rs, offset 
Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two 
bits and sign-extended. The contents of general register rs is compared 
to zero. If the contents of general register rs have the sign bit set, or are 
equal to zero, then the program branches to the target address, with a 
delay of one instruction. 

If the conditional branch is not taken, the instruction in the branch de- 
lay slot is nullified. 

Operation: 



32 



64 



T: 

T+1: 

T: 
T+1: 



i32» 



target «- (offset 15 ) 14 1| offset || 2 

condition <- (GPR[rs] 31 = 1) and (GPR[rs] = Q^) 

if condition then 

else PC <- PC + target 

NullifyCurrentlnstruction 
endif 

target <- (offset 15 ) 38 || offset || 2 

condition «- (GPR[rs] 63 = 1) and (GPR[rs] = 64 ) 

if condition then 

PC <r- PC + target 
else 

NullifyCurrentlnstruction 

endif 



Exceptions: 

None 
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BLTZ 



Branch On Less Than Zero 



BLTZ 



31 26 


25 


21 


20 16 


15 







REGIMM 
00000 1 


rs 


BLTZ 
00000 


offset 


6 




5 


5 




16 





Format: 



BLTZ rs, offset 



Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two 
bits and sign-extended. If the contents of general register rs have the 
sign bit set, then the program branches to the target address, with a 
delay of one instruction. 
Operation: 



32 



64 



T: target <- (offset 15 ) 14 || offset || 2 

condition <- (GPR[rs] 31 = 0) 
T+1: it condition then 

PC <- PC + target 

endif 
T: target «- (offset! 5 ) 38 1 1 offset 1 1 2 

condition «- (GPR[rs] 63 = 0) 
T+1: it condition then 

PC <- PC + target 

endif 



Exceptions: 

None 
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BLTZAL 





31 26 25 




21 20 16 15 











REGIMM 
000001 


rs 


BLTZAL 
10000 


offset 






6 


5 


5 


16 







Format: 

BLTZAL rs, offset 
Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two 
bits and sign-extended. Unconditionally, the address of the instruc- 
tion after the delay slot is placed in the link register, r31 . If the contents 
of general register rs have the sign bit set, then the program branches 
to the target address, with a delay of one instruction. 

General register rs may not be general register 31, because such an in- 
struction is not restartable. An attempt to execute this instruction with 
register 31 specified as rs is not trapped, however. 

Operation: 



32 


T: target <- (offset! 5 ) 14 1| offset || 2 




condition <- (GPR[rs] 31 = 1) 




GPR[31]«-PC + 8 




T+1 : if condition then 




PC (r- PC + target 




endif 


64 


T: target <- (offset, 5 ) 38 1 1 offset 1 1 2 




condition <- (GPR[rs] 63 = 1) 




GPR[31]<-PC + 8 




T+1: if condition then 




PC <- PC + target 




endif 



Exceptions: 

None 
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BLTZALL Than Zero And Link Likely DL I Z.ALL 



31 26 


25 




21 


20 16 


15 







REGIMM 
00000 1 


rs 


BLTZALL 
10010 


offset 


6 




5 




5 




16 





Format: 

BLTZALL rs, offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two 
bits and sign-extended. Unconditionally, the address of the instruc- 
tion after the delay slot is placed in the link register, r31.lt the contents 
of general register rs have the sign bit set, then the program branches 
to the target address, with a delay of one instruction. 
General register rs may not be general register 32, because such an in- 
struction is not restartable. An attempt to execute this instruction with 
register 31 specified as rs is not trapped, however. If the conditional 
branch is not taken, the instruction in the branch delay slot is nullified. 

Operation: 



32 T: target «- (offset^) 14 || offset || 2 
condition «- (GPR[rs] 31 = 1) 
GPR[31] «- PC + 8 
T+1: if condition then 

PC <- PC + target 

else 

NullifyCurrentlnstruction 

endif 

64 T: target «- (offset 15 ) 14 || offset || 2 

condition <- (GPRfrsl^ = 1) 

GPR[31] <- PC + 8 
T+1: if condition then 

PC <- PC + target 

e!se NullifyCurrentlnstruction 



endif 



Exceptions: 

None 
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B LTZ L Branch On Less Than Zero Likely Q LXZ L 





31 26 25 




21 20 16 15 











REGIMM 
000001 


rs 


BLTZL 
00010 


offset 






6 


5 


5 


16 







Format: 

BLTZ rs, offset 
Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two 
bits and sign-extended. If the contents of general register rs have the 
sign bit set, then the program branches to the target address, with a 
delay of one instruction. If the conditional branch is not taken, the in- 
struction in the branch delay slot is nullified. 

Operation: 



32 


T: target *- (offset! 5 ) 1 4 1 1 offset 1 1 2 




condition <- (GPR[rs] 31 = 1) 
T+1: if condition then 




PC <r- PC + target 




NullifyCurrentlnstruction 
end if 


64 


T: target <- (offset 15 ) 38 || offset || 2 




condition <- (GPR[rs] 63 = 1) 
T+1 : if condition then 




PC *- PC + target 
else 

NullifyCurrentlnstruction 



Exceptions: 

None 
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BNE 



Branch On Not Equal 



BNE 



31 



26 25 



21 20 



16 15 



BNE 
0001 01 



rs 



offset 



16 



Format: 



BNE rs, rt, offset 



Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two 
bits and sign-extended. The contents of general register rs and the con- 
tents of general register rt are compared. If the two registers are not 
equal, then the program branches to the target address, with a delay 
of one instruction. 

Operation: 



\14 



32 T: target «- (offset^)'* || offset || 0" 
condition «- (GPR[rs] * GPR[rt]) 
T+1 : if condition then 

PC <- PC + target 
endif 

64 T: target «- (offset, 5 ) 38 || offset || 2 
condition <- (GPR[rs] * GPR[rt]) 
T+1: if condition then 

PC «- PC + target 
endif 



Exceptions: 

None 
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BNEL 



Branch On Not Equal Likely 



BNEL 





31 26 25 




21 20 


16 15 











BNEL 
01010 1 


rs 


rt 


offset 






6 


5 


5 




16 







Format: 

BNEL rs, rt, offset 
Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two 
bits and sign-extended. The contents of general register rs and the con- 
tents of general register rt are compared. If the two registers are not 
equal, then the program branches to the target address, with a delay 
of one instruction. 

If the conditional branch is not taken, the instruction in the branch de- 
lay slot is nullified. 

Operation: 



32 


T: target «- (offset, 5 ) 1 4 1 1 offset 1 1 2 




condition <- (GPR[rs] * GPR[rt]) 




T+1 : if condition then 




PC <- PC + target 
else 

NullifyCurrentlnstruction 




endif 


64 


T: target <- (offset, 5 ) 38 || offset || 2 




condition <- (GPR[rs] * GPR[rt]) 




T+1: if condition then 




PC <- PC + target 
else 

NullifyCurrentlnstruction 




endif 



Exceptions: 

None 
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BREAK 



Breakpoint 



BREAK 



31 



26 25 



65 



SPECIAL 
000000 



code 



BREAK 
001101 



20 



Format: 

BREAK 
Description: 

A breakpoint trap occurs, immediately and unconditionally transfer- 
ring control to the exception handler. 

The code field is available for use as software parameters, but is re- 
trieved by the exception handler only by loading the contents of the 
memory word containing the instruction. 

Operation: 



32, 64 T: BreakpointException 



Exceptions: 

Breakpoint exception 
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Cache 



CACHE 



31 



26 25 



21 20 



16 15 



CACHE 
101111 



base 



op 



offset 



16 



Format: 

CACHE op, offset(base) 
Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The virtual address is translated 
to a physical address using the TLB, and the 5-bit sub-opcode specifies 
a cache operation for that address. 

If CPO is not usable (User or Supervisor mode) the CPO enable bit in 
the Status register is clear, and a coprocessor unusable exception is 
taken. The operation of this instruction on any operation/cache com- 
bination not listed below, or on a secondary cache when none is 
present, is undefined. The operation of this instruction on uncached 
addresses is also undefined. 

The Index operation uses part of the virtual address to specify a cache 
block. 

For a primary cache of 2 CACHESIZE bytes with 2 BLOCKSIZE bytes per tag, 
vAddr CACHESIZE blocksize specifies the block. For a secondary cache 
of 2 CACHESIZE bytes with 2 BLOCKSIZE bytes per tag, pAddr CACHESIZE .. 
blocksize specifies the block. 

Index Load Tag also uses vAddr BLOCKS i ZE .. 3 to select the doubleword 
for reading ECC or parity. When the C£ bit of the Status register is set, 
Hit WriteBack, Hit WriteBack Invalidate, Index WriteBack Invalidate, 
and Fill also use vAddr BLOCKSIZE ... 3 to select the doubleword that has 
its ECC or parity modified. This operation is performed uncondition- 
ally. 

The Hit operation accesses the specified cache as normal data refer- 
ences, and performs the specified operation if the cache block con- 
tains valid data with the specified physical address (a hit). If the cache 
block is invalid or contains a different address (a miss), no operation 
is performed. 
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CACHE 



Cache 
(continued) 



CACHE 



Write back from a primary cache goes to the secondary cache (if there 
is one), otherwise to memory. Write back from a secondary cache al- 
ways goes to memory. A secondary write back always writes the most 
recent data; the data comes from the primary data cache, if present, 
and modified (the W bit is set). Otherwise the data comes from the 
specified secondary cache. The address to be written is specified by 
the cache tag and not the translated physical address. 
TLB Refill and TLB Invalid exceptions can occur on any operation. For 
Index operations (where the physical address is used to index the 
cache but need not match the cache tag) unmapped addresses may be 
used to avoid TLB exceptions. This operation never causes TLB Mod- 
ified or Virtual Coherency exceptions. Bits 17..16 of the instruction 
specify the cache as follows: 



Code 


Name 


Cache 



1 
2 
3 


I 

D 
SI 
SD 


primary instruction 

primary data 

secondary instruction 

secondary data (or combined instruction/data) 
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Cache 
(continued) 



CACHE 



Bits 20..18 of the instruction specify the operation as follows: 



Code 



Caches 



I, SI 



SD 



Name 



Index Invalidate 

Index Write Back 
Invalidate 



Index WriteBack 
Invalidate 



all 



all 



Operation 



Index Load Tag 



Index Store Tag 



Set the cache state of the cache block to Invalid. 
Examine the cache state and l/Vbit of the primary data cache 
block at the Invalidate index specified by the virtual address. 
If the state is not Invalid and the Wbit is set, then write back 
the block to the secondary cache (if present) or to memory (if 
no secondary cache). The address to write is taken from the 
primary cache tag. When a secondary cache is present, and 
the CE bit of the Status register is set, the content of the ECC 
register is XORed into the computed check bits during the 
write to the secondary cache for the addressed doubleword. 
Set cache state of primary cache block to Invalid. 
Examine the cache state of the secondary data cache block 
at the index specified by the physical address. If the state is 
Dirty Exclusive or Dirty Shared, then write back the biock to 
memory and set the cache state to Invalid. The address to 
write is taken from the secondary cache tag, which is not nec- 
essarily the physical address used to index the cache. Like all 
secondary write-backs, the operation writes any modified 
data ( IV bit set) from the primary data cache. Unlike Hit Write- 
back Invalidate the operation does not invalidate or clear the 
W bit in the primary D-cache. In all cases, the secondary 
cache block state is set to Invalid. 

Read the tag for the cache block at the specified index and 
place it into the TagLo and TagHi CPO registers, ignoring 
ECC and parity errors. Also load the data ECC or parity bits 
into the ECC register. 

Write the tag for the cache block at the specified index from 
the TagLo and TagHi CPO registers. 
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CACHE 



Cache 
(continued) 



CACHE 



Hci 



Code Caches 



Name 



Operation 



SD 



Create Dirty 
Exclusive 



l,D 
SI.SD 



Create Dirty 
Exclusive 



Hit Invalidate 
Hit Invalidate 



HitWriteBack 
Invalidate 



This operation is used to avoid loading data needlessly from 
memory when writing new contents into an entire cache block. If 
the cache block is valid but does not contain the specified ad- 
dress (a valid miss) the secondary block is vacated. The data is 
written back to memory if dirty and all matching blocks in both pri- 
mary caches are invalidated. As usual during a secondary write- 
back, if the primary data cache contains modified data (matching 
blocks with Wbit set) that modified data is written to memory. If 
the cache block is valid and does contain the specified physical 
address (a hit), then the operation cleans up the primary caches 
to avoid any virtual alias problems: all blocks in both primary 
caches that match the secondary line are invalidated without 
write back. Note that the search for matching primary blocks uses 
the virtual index of the Pldx field of the secondary cache tag (the 
virtual index to the location last used) and not the virtual index of 
the virtual address used in the operation (the virtual index to the 
location now being used). If the secondary tag and address do 
not match (miss), or the tag and address do match (hit) and the 
block is in a shared state, send an invalidate for the specified ad- 
dress on the system interface. In all cases, set the cache block 
tag to the specified physical address, set the cache state to Dirty 
Exclusive, and set the virtual index field from the virtual address. 
The CH bit in the Status register is set or cleared to indicate a hit 
or miss. 

This operation is used to avoid loading data needlessly from sec- 
ondary cache or memory when writing new contents into an en- 
tire cache block. If the cache block does not contain the specified 
address, and the block is dirty, write it back to the secondary 
cache or memory. In all cases, set the cache block tag to the 
specified physical address, set the cache state to Dirty Exclusive. 

If the cache block contains the specified address, mark the cache 
block invalid. 

If the cache block contains the specified address, mark the cache 
block invalid and also invalidate all matching blocks, if present, in 
the primary caches (the Pldx field of the secondary tag is used to 
determine the locations in the primaries to search). The CHbit in 
the Status register is set or cleared to indicate a hit or miss. 

If the cache block contains the specified address, write back the 
data if it is dirty, and mark the cache block invalid. When a sec- 
ondary cache is present and the CE bit of the Status register is 
set, contents of the ECC register is XORed into the computed 
check bits during the write to the secondary cache for the ad- 
dressed doubleword. 
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Cache 
(continued) 



CACHE 



Code 



Caches 



Name 



Operation 



SD 



Hit WriteBack 
Invalidate 



Fill 



Hit WriteBack 



SD 



Hit WriteBack 



Hit WriteBack 



If the cache block contains the specified address, write back the 
data if it is dirty, and mark the secondary cache block and all 
matching blocks in both primary caches invalid. As usual with sec- 
ondary write-backs, modified data in the primary data cache 
(matching block with the Wbit set) is used during the write-back. 
The Pidx field of the secondary tag is used to determine the loca- 
tions in the primaries to check for matching primary blocks. The 
CH bit in the Status register is set or cleared to indicate a hit or 
miss. 

Fill the primary instruction cache block from secondary or memo- 
ry. If the CE bit of the Status register is set, the contents of the 
ECC register is used instead of the computed parity bits for ad- 
dressed doubleword when written to the instruction cache. 

If the cache block contains the specified address, and the Wbit is 
set, write back the data to memory or the secondary cache, and 
clear the W bit. When a secondary cache is present, and the CE 
bit of the Status register is set, the contents of the ECC register is 
XORed into the computed check bits during the write to the sec- 
ondary cache for the addressed doubleword. 

If the cache block contains the specified address, and the cache 
state is Dirty Exclusive or Dirty Shared, write back the data to 
memory, and change the cache state to Clean Exclusive or 
Shared, respectively. The CHbit in the Status register is set or 
cleared to indicate a hit or miss. The write back looks in the prima- 
ry data cache for modified data, but does not invalidate or clear 
the Wbit in the primary data cache. This state, although perhaps 
not intuitive, is consistent since the primary block contains data 
that is at least as current as that in memory or secondary cache. 
A subsequent write-back of the primary line without further modi- 
fication would be redundant, but not incorrect. 

If the cache block contains the specified address, write back the 
data unconditionally. When a secondary cache is present, and the 
CE bit of the Status register is set, the contents of the ECC regis- 
ter is XORed into the computed check bits during the write to the 
secondary cache for the addressed doubleword. 
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CACHE 



Cache 
(continued) 



CACHE 



Code Caches 



SI.SD 



Name 



Hit Set Virtual 



Operation: 



32, 64 T: 



Operation 



This operation is used to change the virtual index of secondary 
cache contents avoiding unnecessary memory operations. If 
the cache block contains the specified address, invalidate 
matching blocks in the primary caches at the index formed by 
concatenating Pldx in the secondary cache tag (not the virtual 
address of the operation) and vAddr 1t 4 then set the virtual in- 
dex field of the secondary cache tag from the specified virtual 
address. Modified data in the primary data cache is not pre- 
served by the operation and should be explicitly written back 
before this operation. The CH bit in the Status register is set or 
cleared to indicate a hit or miss. 



vAddr «- ((offset-15) 
(pAddr, uncached) 
DATA) 



48 || offset 15 „ ) + GPR[base] 

AddressTranslation (vAddr, 



<— 



Exceptions: 

Coprocessor unusable exception 
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fN ■— A\— Move Control From 
\jr r" VsZ Coprocessor 


CFCz 


31 26 25 21 20 16 15 11 10 





COPz 
1 OOxx* 


CF 
000 10 


rt 


rd 



00000 


6 5 5 5 


11 



Format: 

CFCz rt, rd 
Description: 

The contents of coprocessor control register rd of coprocessor unit 2 
are loaded into general register rt. 
This instruction is not valid for CP0. 
Operation: 



32 



64 



T: data «- CCR[z,rd] 

T + 1: GPR[rt] <- data 

T: data<-(CCR[z,rd] 31 ) 

T+1: GPR[rt] f- data 



32 



|| CCR[z.rd] 



Exceptions: 

Coprocessor unusable exception 
*Opcode Bit Encoding: 



CFCz 



Bi t #31 30 29 28 27 26 25 24 23 22 21 



CFC3 



V_ 



"X - 



z?z 



Opcode I Coprocessor Suboperation 

Coprocessor Unit Number 



Bit #31 


30 


29 


28 


27 


26 


25 


24 


23 


22 


21 





C-FC1 





1 











1 











1 







Bit 


#31 


30 


29 


28 


27 


26 


25 


24 


23 


22 21 





CFC2 





1 








1 










- 





1 
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COPz 



Coprocessor Operation 



COPz 



31 



26 25 24 



COPZ 
1 x x* 



CO 

1 



cofun 



25 



Format: 



COPz cofun 



Description: 

A coprocessor operation is performed. The operation may specify and 
reference internal coprocessor registers, and may change the state of 
the coprocessor condition line, but does not modify state within the 
processor or the cache/memory system. Details of coprocessor opera- 
tions are contained in Appendix B. 

Operation: 



32,64 



T: 



CoprocessorOperation (z, co- 



Exceptions: 

Coprocessor unusable exception 

Coprocessor interrupt or Floating-Point Exception (R4000 CP1 only) 

*Opcode Bit Encoding: 



COPZ Rit # 31 30 29 28 27 26 25 



1 



1 



C0P0 
Rit # 31 30 29 28 27 26 25 

C0P1 



1 



1 



1 



Bit # 31 3 29 28 27 26 25 
C0P2, 



1 



1 



1 



Rit # 31 30 29 28 27 26 25 
C0P3 




I i CO sub-opcode (see end of Appendix A) 

L Coprocessor Unit Number 
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OTGZ Move Control toCoprocessor CTGZ 



31 



26 25 



21 20 16 15 



11 10 



COPz 
01 OOxx* 


CT 
00110 


rt 


rd 



000 0000 0000 



6 



11 



Format: 

CTCz rt, rd 
Description: 

The contents of general register rt are loaded into control register rd of 
coprocessor unit z. 

This instruction is not valid for CP0. 

Operation: 



32,64 T: data <- GPR[rt] 
T+1: CCR[z,rd] <- data 



Exceptions: 

Coprocessor unusable 



"See "CPU Instruction Opcode Bit Encoding" at the end of 
Appendix A. 
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DADD 



Doubleword Add 



DADD 





31 26 


25 21 20 


16 15 


11 10 6 5 






SPECIAL 
000000 


rs 


rt 


rd 



00000 


DADD 
101100 






6 


5 5 


5 


5 6 






Format: 


DADD rd, 


rs, rt 















Description: 

The contents of general register rs and the contents of general register 
rt are added to form the result. The result is placed into general regis- 
ter rd. 

An overflow exception occurs if the carries out of bits 62 and 63 differ 
(2's-complement overflow). The destination register rd is not modi- 
fied when an integer overflow exception occurs. 
This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruc- 
tion exception. 

Operation: 



64 T: GPR[rd] «-GPR[rs] + GPR[rt] 



Exceptions: 

Integer overflow exception 

Reserved instruction exception (R4000 in 32-bit mode) 
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DADDI 



Doubleword Add Immediate 



DADDI 



31 26 


25 




21 


20 


16 


15 





DADDI 
011000 


rs 


rt 


immediate 


6 




5 




5 






16 



Format: 

DADDI rt, rs, immediate 
Description: 

The 16-bit immediate is sign-extended and added to the contents of 
general register rs to form the result. The result is placed into general 
register rt. 

An overflow exception occurs if carries out of bits 62 and 63 differ (2's- 
complement overflow). The destination register rt is not modified 
when an integer overflow exception occurs. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruc- 
tion exception. 

Operation: 



64 T: GPR [rt] <- GPR[rs] + (immediate 15 ) 48 || immediate 15 .. 



Exceptions: 

Integer overflow exception 

Reserved instruction exception (R4000 in 32-bit mode) 
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DADDIU 



Doubleword Add 
Immediate Unsigned 



DADDIU 





31 26 25 




21 20 


16 15 











DADDIU 
011001 


rs 


rt 


immediate 






6 


5 


5 




16 







Format: 

DADDIU rt, rs, immediate 
Description: 

The 16-bit immediate is sign-extended and added to the contents of 
general register rs to formthe result. The result is placed into general 
register rt. No integer overflow exception occurs under any circum- 
stances. 

The only difference between this instruction and the DADDI instruc- 
tion is that DADDIU never causes an overflow exception. 
This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruc- 
tion exception. 

Operation: 



64 



GPR [rt] <- GPR[rs] + (immediate 15 ) 48 || immediate! 5 .. 



Exceptions: 

Reserved instruction exception (R4000 in 32-bit mode) 
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DADDU Doubleword Add Unsigned DADDU 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL rs 
000000 J 


rt 


rd 



00000 


DADDU 
101101 



Format: 

DADDU rd, rs, rt 
Description: 

The contents of general register rs and the contents of general register 
rt are added to form the result. The result is placed into general regis- 
ter rd. 
No overflow exception occurs under any circumstances. 

The only difference between this instruction and the DADD instruc- 
tion is that DADDU never causes an overflow exception. 
This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruc- 
tion exception. 

Operation: 



64 



GPRJrd] *-GPR[rs] + GPR[rt] 



Exceptions: 

Reserved instruction exception (R4000 in 32-bit mode) 
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DDIV 



Doubleword Divide 



DDIV 



31 26 


25 




21 


20 


16 


15 6 


5 


SPECIAL 
000000 


rs 


it 



00 0000 0000 


DDIV 

011110 


6 




5 




5 




10 


6 



Format: 

DDIV rs,rt 

Description: 

The contents of general register rs are divided by the contents of gen- 
eral register rt, treating both operands as 2's-complement values. No 
overflow exception occurs under any circumstances, and the result of 
this operation is undefined when the divisor is zero. 
This instruction is typically followed by additional instructions to 
check for a zero divisor and for overflow. 

When the operation completes, the quotient word of the double result 
is loaded into special register LO, and the remainder word of the dou- 
ble result is loaded into special register HI. 

If either of the two preceding instructions is MFHI or MFLO, the re- 
sults of those instructions are undefined. Correct operation requires 
separating reads of HI or LO from writes by two or more instructions. 
This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruc- 
tion exception. 
Operation: 



64 



T-2: 
T-1: 

T: 


LO 

HI 

LO 

HI 

LO 

HI 


«- undefined 
«- undefined 
<— undefined 
<- undefined 
<-GPR[rs] d;VGPR[rt] 
«-GPR[rs] mod GPR[rt] 



Exceptions: 

Reserved instruction exception (R4000 in 32-bit mode) 
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DDIVU Doubleword Divide Unsigned DDIVU 



31 



26 25 



21 20 



16 15 



6 5 



SPECIAL 
000000 


rs 


rt 



000000 0000 


DDIVU 
011111 



6 



10 



6 



Format: 

DDIVU rs, rt 
Description: 

The contents of general register rs are divided by the contents of gen- 
eral register rt, treating both operands as unsigned values. No integer 
overflow exception occurs under any circumstances, and the result of 
this operation is undefined when the divisor is zero. 

This instruction is typically followed by additional instructions to 
check for a zero divisor. 

When the operation completes, the quotient word of the double result 
is loaded into special register LO, and the remainder word of the dou- 
ble result is loaded into special register HI. 

If either of the two preceding instructions is MFHI or MFLO, the re- 
sults of those instructions are undefined. Correct operation requires 
separating reads of HI or LO from writes by two or more instructions. 
This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruc- 
tion exception. 

Operation: 



64 


T-2: 


LO 


<- undefined 






HI 


«- undefined 




T-1: 


LO 


<- undefined 






HI 


<- undefined 




T: 


LO 


«- (0 || GPR[rs]) div (0 || GPR[rt]) 






HI 


«- (0 || GPR[rs]) mod (0 || GPR[rt]) 



Exceptions: 

Reserved instruction exception (R4000 in 32-bit mode) 
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DIV 



Divide 



DIV 





31 26 


25 21 20 


16 15 6 


5 






SPECIAL 
0000 00 


rs 


rt 



00 0000 0000 


DIV 
0110 10 






6 


5 5 


10 


6 






Format: 


DIV rs,rt 













Description: 

The contents of general register rs are divided by the contents of gen- 
eral register rt, treating both operands as 2's-complement values. No 
overflow exception occurs under any circumstances, and the result of 
this operation is undefined when the divisor is zero. In 64-bit mode, 
the operands must be valid sign-extended, 32-bit values. 
This instruction is typically followed by additional instructions to 
check for a zero divisor and for overflow. 

When the operation completes, the quotient word of the double result 
is loaded into special register LO, and the remainder word of the dou- 
ble result is loaded into special register HI. 

If either of the two preceding instructions is MFHI or MFLO, the re- 
sults of those instructions are undefined. Correct operation requires 
separating reads of HI or LO from writes by two or more instructions. 
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DIV 



Divide 
(continued) 



DIV 



Operation: 



32 



64 



1-2: 
T-1: 
T: 

T-2: 

T-1: 
T: 



LO 


<- undefined 


HI 


<- undefined 


LO 


«- undefined 


HI 


«- undefined 


LO 


<-GPR[rs]oVVGPR[rt] 


HI 


<-GPR[rs]modGPR[rt] 


LO 


<- undefined 


HI 


<- undefined 


LO 


«- undefined 


HI 


<- undefined 


q 


<- GPR[rsl 31 .. cftVGPR[rtl 31 „ 


r 


<- GPR[rs] 31 .. mod GPR{rt] 31 .. 


LO 


^- (qsi) 32 II Q31..0 


HI 


^(r 3 i) 32 Hr 3 i..o 



Exceptions: 

None 
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DIVU 



Divide Unsigned 



DIVU 





31 26 


25 


21 20 


16 15 6 


5 






SPECIAL 
000000 


rs 


rt 



000000 0000 


DIVU 
0110 11 






6 




5 5 


10 


6 






Format: 


DIVU rs,rt 











Description: 

The contents of general register rs are divided by the contents of gen- 
eral register rt, treating both operands as unsigned values. No integer 
overflow exception occurs under any circumstances, and the result of 
this operation is undefined when the divisor is zero. In 64-bit mode, 
the operands must be valid sign-extended, 32-bit values. In 64-bit 
mode, the operands must be valid sign-extended, 32-bit values. 
This instruction is typically followed by additional instructions to 
check for a zero divisor. 

When the operation completes, the quotient word of the double result 
is loaded into special register LO, and the remainder word of the dou- 
ble result is loaded into special register HI. 

If either of the two preceding instructions is MFHI or MFLO, the re 
suits of those instructions are undefined. Correct operation requires 
separating reads of HI or LO from writes by two or more instructions. 
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DIVU 



Divide Unsigned 
(continued) 



DIVU 



Operation: 



32 


1-2: 


LO 


<— undefined 






HI 


4- undefined 




T-1: 


LO 


<- undefined 






HI 


<- undefined 




T: 


LO 


<- (0 || GPR[rs]) d/V (0 || GPR[rt]) 






HI 


«- (0 || GPR[rs]) mod (0 || GPR[rt]) 


64 


T-2: 


LO 


<- undefined 






HI 


<- undefined 




T-1: 


LO 


<- undefined 






HI 


<— undefined 




T: 


q 


«- (0 || GPR[rs] 31 ) div{0 || GPR[rt] 31 „ ) 






r 


f- (0 || GPR[rs] 31 .. ) mod(0 || GPR[rt] 31 .. ) 






LO 


<- (Q3l)^ II 031..0 






HI 


«-(rsi) H r 3i..o 



Exceptions: 

None 



/W000 Leer's Manual-Preliminary 



A-59 



Appendix A 



DMFCO 



Doubleword Move From 
System Control Coprocessor 



DMFCO 



31 



26 25 



21 20 



16 15 



11 10 



COP0 
010000 



DMF 
00001 



rt 



rd 



000 0000 00 00 



11 



Format: 



DMFCO rt, rd 



Description: 

The contents of coprocessor register rd of the CPO are loaded into gen- 
eral register rt. 

This operation is defined for the R4000 operating in 64-bit mode and 
in 32-bit kernal mode. Execution of this instruction in 32-bit user or su- 
pervisor mode causes a reserved instruction exception. All 64-bits of 
the general regiser destination are written from the coprocessor regis- 
ter source. The operation of DMFCO on a 32-bit coprocessor register 
is undefined. 
Operation: 



64 



T: data <-CPR[0,rd] 

T+1 : GPR[rt] <- data 



Exceptions: 

Coprocessor unusable exception 

Reserved instruction exception (R4000 in 32-bit user mode 

R4000 in 32-bit supervisor mode) 



A-60 



R4000 User's Manua<--Preliininary 



rMUITV^n Doubleword Move To HMTPO 

UIVI I V/U System Control Coprocessor L/IVI I V^V/ 



31 



26 25 



21 20 



16 15 



11 10 



COP0 
10000 



DMT 
00101 



rd 



000 0000 00 00 



11 



Format: 

DMTC0 rt, rd 

Description: 

The contents of general register rt are loaded into coprocessor register 
rd of the CP0. 

This operation is defined for the R4000 operating in 64-bit mode or in 
32-bit kernal mode. Execution of this instruction in 32-bit user or su- 
pervisor mode causes a reserved instruction exception. All 64-bits of 
the coprocessor regiser are written from the general register source. 
The operation of DMTCO on a 32-bit coprocessor register is unde- 
fined. 

Because the state of the virtual address translation system may be al- 
tered by this instruction, the operation of load, store instructions and 
TLB operations immediately prior to and after this instruction are un- 
defined. 
Operation: 



64 T: data «- GPR[rt] 
T+1: CPR[0,rd] <- data 



Exceptions: 

Coprocessor unusable exception (R4000 in 32-bit user mode 

R4000 in 32-bit supervisor mode) 
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DMULT 



Doubleword Multiply 



DMULT 



31 



26 25 



21 20 



16 15 



6 5 



SPECIAL 
000000 


rs 


rt 



00 0000 0000 


DMULT 
011100 



6 



10 



6 



Format: 

DMULT rs, rt 
Description: 

The contents of general registers rs and rt are multiplied, treating both 
operands as 2's-complement values. No integer overflow exception 
occurs under any circumstances. 

When the operation completes, the low-order word of the double re- 
sult is loaded into special register LO, and the high-order word of the 
double result is loaded into special register HI. 

If either of the two preceding instructions is MFHI or MFLO, the re- 
sults of these instructions are undefined. Correct operation requires 
separating reads of HI or LO from writes by a minimum of two other 
instructions. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruc- 
tion exception. 

Operation: 



64 


T-2: 


LO <- undefined 
HI «- undefined 




T-1: 


LO <- undefined 
HI <- undefined 




T: 


t <- GPR[rs] * GPR[rtj 

LO <r- t63.. 

H ,<_ t 127..64 



Exceptions: 

Reserved instruction exception (R4000 in 32-bit mode) 
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DMULTU 



Doubleword Multiply 
Unsigned 



DMULTU 



31 



26 25 



21 20 



16 15 



6 5 



SPECIAL 
000000 



rs 




00 0000 0000 



DMULTU 
01110 1 



10 



Format: 



DMULTU rs, rt 



Description: 

The contents of general register rs and the contents of general register 
rt are multiplied, treating both operands as unsigned values. No over- 
flow exception occurs under any circumstances. 
When the operation completes, the low-order word of the double re- 
sult is loaded into special register LO, and the high-order word of the 
double result is loaded into special register HI. 
It either of the two preceding instructions is MFHI or MFLO, the re- 
sults of these instructions are undefined. Correct operation requires 
separating reads of HI or LO from writes by a minimum of two in- 
structions. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruc- 
tion exception. 

Operation: 



32 T-2: 



T-1: 



T: 



LO <- undefined 
HI <r- undefined 
LO <- undefined 
HI <r- undefined 
t «- (0 || GPR[rs]) 
LO <- % o 

Hl <-tl27..64 



(0 || GPRfrt]) 



Exceptions: 

Reserved instruction exception (R4000 in 32-bit mode) 
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DS L L Doubleword Shift Left Logical 


DSLL 


31 26 25 21 20 16 15 11 10 6 


5 


SPECIAL 
000000 



00000 


rt 


rd 


sa 


DSLL 
111000 


6 5 5 5 5 


6 



Format: 

DSLL rd,rt,sa 
Description: 

The contents of general register rt are shifted left by sa bits, inserting 
zeros into the low-order bits. The result is placed in register rd. 

Operation: 




Exceptions: 

Reserved instruction exception (R4000 in 32-bit mode) 
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DSLLV 



Doubieword Shift Left 
Logical Variable 



DSLLV 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 



rs 



rd 



00000 



DSLLV 
010 100 



Format: 

DSLLV rd, rt,rs 
Description: 

The contents of general register rt are shifted left by the number of bits 
specified by the low-order six bits contained as contents of general 
register rs, inserting zeros into the low-order bits. The result is placed 
in register rd. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruc- 
tion exception. 

Operation: 



64 



T: s «- GP[rs] 5 

GPR[rd]^ GPR[rt] (63 _ s) .. || s 



Exceptions: 



Reserved instruction exception (R4000 in 32-bit mode) 
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DSLL32 



Doubleword Shift Left 
Logical + 32 



DSLL32 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 




00000 



rd 



sa 



DSLL32 
111100 



6 



Format: 

DSLL32 rd, rt, sa 
Description: 

The contents of general register rt are shifted left by 32+sa bits, insert- 
ing zeros into the low-order bits. The result is placed in register rd. 
This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruc- 
tion exception. 
Operation: 




Exceptions: 

Reserved instruction exception (R4000 in 32-bit mode) 
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DSR A Doubleword Shift Right Arithmetic DSR A 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 




00000 



rd 



sa 



DSRA 
111011 



6 



Format: 

DSRA rd,rt, sa 
Description: 

The contents of general register rt are shifted right by sa bits, sign-ex- 
tending the high-order bits. The result is placed in register rd. 
This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruc- 
tion exception. 

Operation: 



64 T: S <- || sa 

GPR[rd] «- (GPR[rt]63) s || GPR[rt] ^ 



Exceptions: 

Reserved instruction exception (R4000 in 32-bit mode) 
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DSRAV 



Doubleword Shift Right 
Arithmetic Variable 



DSRAV 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 



rs 



rd 




00000 



DSRAV 
010111 



Format: 



DSRAV rd, rt, rs 



Description: 

The contents of general register rt are shifted right by the number of 
bits specified by the low-order six bits of general register rs, sign-ex- 
tending the high-order bits. The result is placed in register rd. 
This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruc- 
tion exception. 

Operation: 



64 T: s «- GPR[rs] 5 . 



GPR[rd] «- (GPRlrtJes) 5 1| GPRIrtfa.. 



Exceptions: 

Reserved instruction exception (R4000 in 32-bit mode) 
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DSRA32 



Doubieword Shift Right 
Arithmetic + 32 



DSRA32 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 




00000 



rd 



sa 



DSRA32 
111111 



Format: 

DSRA32 rd, rt, sa 

Description: 

The contents of general register rt are shifted right by 32+sa bits, sign- 
extending the high-order bits. The result is placed in register rd. 
This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruc- 
tion exception. 

Operation: 




Exceptions: 

Reserved instruction exception (R4000 in 32-bit mode) 
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DS R L Doubleword Shift Right Logical 



DSRL 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 



00000 


rt 


rd 


sa 


DSRL 
111010 



Format: 

DSRL rd,rt,sa 

Description: 

The contents of general register rt are shifted right by sa bits, inserting 
zeros into the high-order bits. The result is placed in register rd. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruc- 
tion exception. 

Operation: 




Exceptions: 

Reserved instruction exception (R4000 in 32-bit mode) 



A-70 



R4000 User's Manual-Preliminary 



DSRLV 



Doubleword Shift Right 
Logical Variable 



DSRLV 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 



rs 



rd 




00000 



DSRLV 
010110 



Format: 



DSRLV rd, rt, is 



Description: 

The contents of general register rt are shifted right by the number of 
bits specified by the low-order six bits of general register rs, inserting 
zeros into the high-order bits. The result is placed in register rd. 
This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruc- 
tion exception. 
Operation: 



64 T: s «- GPR[rs] 5 

GPR[rd] <- s || GPR[rt] 63 .. s 



Exceptions: 

Reserved instruction exception (R4000 in 32-bit mode) 
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DSRL32 



Doubleword Shift Right 
Logical + 32 



DSRL32 





31 26 25 21 20 


16 15 


11 10 6 5 






SPECIAL 
000000 



00000 


rt 


rd 


sa 


DSRL32 
111110 






6 5 5 


5 


5 6 





Format: 

DSRL32 rd, rt, sa 
Description: 

The contents of general register rt are shifted right by 32+sa bits, in- 
serting zeros into the high-order bits. The result is placed in register rd. 
This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruc- 
tion exception. 

Operation: 




Exceptions: 

Reserved instruction exception (R4000 in 32-bit mode) 
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DSUB 



Doubleword Subtract 



DSUB 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 



rs 



rd 




00000 



DSUB 
101110 



Format: 

DSUB rd, rs, rt 
Description: 

The contents of general register rt are subtracted from the contents of 
general register rs to form a result. The result is placed into general 
register rd. 

The only difference between this instruction and the DSUBU instruc- 
tion is that DSUBU never traps on overflow. 

An integer overflow exception takes place if the carries out of bits 62 
and 63 differ (2's-complement overflow). The destination register rd is 
not modified when an integer overflow exception occurs. 
This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruc- 
tion exception. 
Operation: 



64 



GPR[rd] <- GPR[rs] - GPR[rt] 



Exceptions: 

Integer overflow exception 

Reserved instruction exception (R4000 in 32-bit mode) 
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DS U B U Doubleword Subtract Unsigned Qg U B U 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 


rs 


rt 


rd 



00000 


DSUBU 
101111 



Format: 

DSUBU rd, rs, rt 
Description: 

The contents of general register rt are subtracted from the contents of 
general register rs to form a result. The result is placed into general 
register rd. 

The only difference between this instruction and the DSUB instruction 
is that DSUBU never traps on overflow. No integer overflow excep- 
tion occurs under any circumstances. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruc- 
tion exception. 

Operation: 



64 T: GPR[rd] «- GPR[rs] - GPR[rt] 



Exceptions: 

Reserved instruction exception (R4000 in 32-bit mode) 
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ERET 



Exception Return 



ERET 



31 



26 25 24 



6 5 







COPO 
010000 



CO 

1 




00 00 00 0000 000 0000 



ERET 
11000 



19 



6 



Format: 



ERET 



Description: 

ERET is the R4000 instruction for returning from an interrupt, excep- 
tion, or error trap. Unlike a branch or jump instruction, ERET does not 
execute the next instruction. 

ERET must not itself be placed in a branch delay slot. 
If the processor is servicing an error trap (SR 2 = 1), then load the PC 
from the ErrorEPC and clear the ERL bit of the Status register (SR 2 )- 
Otherwise (SR 2 = 0), load the PC from the EPC, and clear the EXL bit 
of the Status register (SRj). 

An ERET executed between a LL and SC also causes the SC to fail. 
Operation: 



32, 64 T: if SR 2 = 1 then 

PC *- ErrorEPC 

SR «- SR 31 .. 3 || || SRl.0 

else 

PC <- EPC 
SR<-SR 31 ..2||0||SRo 

endif 

LLbit <- 



Exceptions: 

Coprocessor unusable exception 
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Jump 




Format: 

J target 
Description: 

The 26-bit target address is shifted left two bits and combined with the 
high-order bits of the address of the delay slot. The program uncondi- 
tionally jumps to this calculated address with a delay of one instruc- 
tion. 
Operation: 



32 T: temp <- target 

T+1: PC «- PC 31 .. 28 II temp || 2 

64 T: temp «- target 

T+1: PC ^ PC 63 ..28 II temp || 2 



Exceptions: 

None 
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JAL Jump And Link JAL 




Format: 

JAL target 
Description: 

The 26-bit target address is shifted left two bits and combined with the 
high-order bits of the address of the delay slot. The program uncondi- 
tionally jumps to this calculated address with a delay of one instruc- 
tion. The address of the instruction after the delay slot is placed in the 
link register, r3 1. 
Operation: 



32 T: temp <- target 

GPR[31] <- PC + 8 
T+1 : PC f- PC 31 .28 II temp || 2 

64 T: temp <- target 

GPR[31]<-PC + 8 
T+1 : PC <- PC 6 3.28 II tem P II 2 



Exceptions: 

None 
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JALR 



Jump And Link Register 



JALR 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 



rs 




00000 



rd 



00000 



JALR 
001001 



Format: 

JALR rs 
JALR rd, rs 

Description: 

The program unconditionally jumps to the address contained in gen- 
eral register rs, with a delay of on instruction. The address of the in- 
struction after the delay slot is placed in general register rd. The de- 
fault value of rd, if omitted in the assembly language instruction, is 31. 
Register specifiers rs and rd may not be equal, because such an instruc- 
tion does not have the same effect when reexecuted. However, an at- 
tempt to execute this instruction is not trapped, and the result of exe- 
cuting such an instruction is undefined. 

Since instructions must be word-aligned, a Jump and Link Register in- 
struction must specify a target register (rs) whose two low-order bits 
are zero. If these low-order bits are not zero, an address exception will 
occur when the jump target instruction is subsequently fetched. 

Operation: 



32,64 



T: temp *- GPR [rs] 

GPR[rd] «- PC + 8 
T+1 : PC <- temp 



Exceptions: 

None 
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JR 



Jump Register 



JR 



31 



26 25 



21 20 



65 



SPECIAL 
000000 


rs 



000 0000 0000 0000 


JR 
00 1000 



15 



Format: 

JR rs 
Description: 

The program unconditionally jumps to the address contained in gen- 
eral register rs, with a delay of one instruction. 
Since instructions must be word-aligned, a Jump Register instruction 
must specify a target register (rs) whose two low-order bits are zero. If 
these low-order bits are not zero, an address exception will occur 
when the jump target instruction is subsequently fetched. 

Operation: 



32, 64 T: temp <- GPR[rs] 
T+1: PC <- temp 



Exceptions: 

None 
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LB 



31 



26 25 



21 20 



Load Byte 



16 15 



LB 



LB 
100000 



base 



offset 



16 



Format: 

LB it, offset(base) 
Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the byte at the 
memory location specified by the effective address are sign-extended 
and loaded into general register rt. 

Operation: 



32 T: vAddr «- ((offset 15 ) 16 || offeet 15 .. ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 

pAddr «- pAddr PSlZE _ 1 .. 2 II ( pAdd^ xor ReverseEndian 2 ) 
mem <- LoadMemory (uncached, BYTE, pAddr, vAddr, DATA) 
bytef-vAddr! xor BigEndianCPU 2 
GPR[rt] i- (mem 7+8 . byte ) 24 || mem 7+8 . byte .. 8 - byte 

64 T: vAddr «- ((offset! 5 ) 48 || offset 15 „ ) + GPR[base] 

(pAddr, uncached) *- AddressTranslation (vAddr, DATA) 
pAddr «- pAddr PS | ZE _ -, .. 3 1| ( pAddr 2 .. xor ReverseEndian 3 ) 
mem <- LoadMemory (uncached, BYTE, pAddr, vAddr, DATA) 
byte<-vAddr 2 xor BigEndianCPU 3 
GPR[rt] <- (mem 7+8 . bvte ) 56 || mem 7+8 . byt e..8-byte 



Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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L B U Load B y te Unsigned 




LBU 


31 26 25 21 20 16 15 







LBU 
100100 


base 


rt 


offset 


6 5 5 


16 





Format: 

LBU rt, offset(base) 
Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the byte at the 
memory location specified by the effective address are zero-extended 
and loaded into general register rt. 

Operation: 



32 T: 



64 T: 



vAddr <- ((offset 15 ) 16 || otfset 15 . ) + GPR[base] 
(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <- pAddr PS | ZE _ ! 2 II (pAddr^.o xor ReverseEndian 2 ) 
mem <- LoadMemory (uncached, BYTE, pAddr, vAddr, DATA) 
byte <- vAddr 10 xor BigEndianCPU 2 

vAddr *- ((offset 15 ) 48 || offset 15 .. ) + GPR[base] 
(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr «- pAddr PSlZE _ t 3 || (pAddr 2 xor ReverseEndian 3 ) 
mem <- LoadMemory (uncached, BYTE, pAddr, vAddr, DATA) 
byte <- vAddr 2 _ xor BigEndianCPU 3 



Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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LD 



Load Doubleword 



LD 



31 



26 25 



21 20 



16 15 



LD 

110111 



base 



offset 



16 



Format: 

LD rt, offset(base) 
Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the 64-bit dou- 
bleword at the memory location specified by the effective address are 
loaded into general register rt. 

If any of the three least-significant bits of the effective address are non- 
zero, an address error exception occurs. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruc- 
tion exception. 

Operation: 



64 T: vAddr *- ((offset 15 ) 48 || offset 15 . ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 

mem <- LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA) 

GPR[rt] <- mem 



Exceptions: 

TLB refill exception 

TLB invalid exception 

Bus error exception 

Address error exception 

Reserved instruction exception (R4000 in 32-bit user mode 

R4000 in 32-bit supervisor mode) 
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LDCz 



Load Doubleword To Coprocessor LDCZ 



31 



26 25 



21 20 



16 15 



LDCz 
1 1 1 x x* 



base 



offset 



16 



Format: 

LDCz rt, offset(base) 
Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The processor reads a double- 
word from the addressed memory location and makes the data avail- 
able to coprocessor unit z. The manner in which each coprocessor uses 
the data is defined by the individual coprocessor specifications. 
If any of the three least-significant bits of the effective address are non- 
zero, an address error exception takes place. 
This instruction is not valid for use with CPO. 
This instruction is undefined when the least-significant bit of the 
rt-field is non-zero. 

Execution of the instruction referencing coprocessor 3 causes a re- 
served instruction exception, not a coprocessor unusable exception. 



*See the table "Opcode Bit Encoding" on next page, or "CPU Instruc- 
tion Opcode Bit Encoding" at the end of Appendix A. 
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LDCz 



Load Doubleword To Coprocessor 
(continued) 



LDCz 



Operation: 



32 T: vAddr«-((offset 15) 16 ||offset 15 ) + GPR[base] 

(pAddr, uncached) «- Addressfranslation (vAddr, DATA) 

mem <- LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA) 

COPzLD (rt, mem) 

64 T: vAddr «- ((offset 15) 48 || offset 15 ) + GPR[base] 

(pAddr, uncached) <- Addressfranslation (vAddr, DATA) 

mem <~ LoadMemory (uncached, DOUBLEWORD, pAddr, vAddr, DATA) 

COPzLD (rt, mem) 



Exceptions: 

TLB refill exception 

TLB invalid exception 

Bus error exception 

Address error exception 

Coprocessor unusable exception 

Reserved instruction exception (coprocessor 3) 



Opcode Bit Encoding: 



LDCZ Bit #31 30 29 28 

LDC1 



1 



1 



Bit #31 30 
LDC2 



1 



Bit #31 30 
LDC3 



1 



1 



1 



1 



1 



27 26 



29 28 27 26 



29 28 27 26 



_^A. 




— V~~ 

Opcode Coprocessor Unit Number 
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Load Doubleword Left 



LDL 



31 



26 25 



21 20 



16 15 



LDL 
011010 



base 



offset 



16 



Format: 



LDL rt, offset(base) 



Description: 

This instruction can be used in combination with the LDR instruction 
to load a register with eight consecutive bytes from memory, when the 
bytes cross a boundary between two doublewords. LDL loads the left 
portion of the register from the appropriate part of the high-order 
doubleword; LDR loads the right portion of the register from the ap- 
propriate part of the low-order doubleword. 

The LDL instruction adds its sign-extended 16-bit offset to the contents 
of general register base to form a virtual address which can specify an 
arbitrary byte. It reads bytes only from the doubleword in memory 
which contains the specified starting byte. From one to eight bytes will 
be loaded, depending on the starting byte specified. 
Conceptually, it starts at the specified byte in memory and loads that 
byte into the high-order (left-most) byte of the register; then it pro- 
ceeds toward the low-order byte of the doubleword in memory and 
the low-order byte of the register, loading bytes from memory into the 
register until it reaches the low-order byte of the doubleword in mem- 
ory. The least-significant (right-most) byte(s) of the register will not 
be changed. 



address 8 
address 







memory 
(big-endian) 




register 


8 


9 


10 11 


12 


13 I 14 


15 




before A B C DE F GH 





1 


2 3 


4 


5 6 


7 





LDL $24,3($0) 



after 



3 


4 


5 


6 


7 F 


G 


H 



$24 



$24 
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■ r%i Load Doubleword Left |_DI_ 

i- U L. (continued) LUL 



The contents of general register rt are internally bypassed within the 
processor so that no NOP is needed between an immediately preced- 
ing load instruction which specifies register rt and a following LDL (or 
LDR) instruction which also specifies register rt. 
No address exceptions due to alignment are possible. 
This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruc- 
tion exception. 

Operation: 



64 T: vAddr «- ((otfset 15 ) 48 || oflset 15 .. ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 

pAddr <- pAddr PS izE-i..3 II (P A ddr 2 ..o xor ReverseEndian 3 ) 

if BigEndianMem - then 

pAddr <- pAddr PS | ZE _ 1 3 1| 3 

endif 

byte " vAddr 2 .. xor BigEndianCPU 3 

mem «- LoadMemory (uncached, byte, pAddr, vAddr, DATA) 

GPR[rt] «- mem 7+8 . byte .. || GPR[rt] 63 _ 8 - byte .. 
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LDL 



Load Doubleword Left 
(continued) 



LDL 



Given a doubleword in a register and a doubleword in memory, the 
operation of LDL is as follows: 



LDL 

Register 

Memory 




















A 


B 


c 


D 


E 


F 


G 


H 






















1 


J 


K 


L 


M 


N 





P 























vAddr 2 


BigEndianCPU = 


BigEndianCPU = 1 


destination 


type 


otfset 


destination 


type 


offset 


LEM BEM 


LEM BEM 



1 
2 
3 
4 
5 
6 
7 


P BCDEFGH 
OPCDEFGH 
NOPDEFGH 
MNOPEFGP 
L MNOPFGH 
K LMNOPGH 
J K L MN OP H 
I J K L MN P 



1 
2 
3 
4 
5 
6 
7 


' 7 
6 
5 
4 
3 
2 
1 



I J K L MN P 
JKLMNOPH 
KLMNOPGH 
LMNOPFGH 
MNOPEFGH 
NOPDEFGH 
OPCDEFGH 
PBCDEFGH 


7 
6 
5 
4 
3 
2 
1 




1 
2 
3 
4 
5 
6 
7 



LEM BigEndianMem = 

BEM BigEndianMem = 1 

Type AccessType (see Figure 2-2) sent to memory 

Offset pAddr 2 ..o sent to memory 

Exceptions: 

TLB refill exception 

TLB invalid exception 

Bus error exception 

Address error exception 

Reserved instruction exception (R4000 in 32-bit mode) 
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LDR 



Load Doubleword Right 



LDR 



31 



26 25 



21 20 



16 15 



LDR 
11011 



base 



offset 



16 



Format: 



LDR rt, offset(base) 



Description: 

This instruction can be used in combination with the LDL instruction 
to load a register with eight consecutive bytes from memory, when the 
bytes cross a boundary between two doublewords. LDR loads the 
right portion of the register from the appropriate part of the low-order 
doubleword; LDL loads the left portion of the register from the appro- 
priate part of the high-order doubleword. 

The LDR instruction adds its sign-extended 16-bit offset to the con- 
tents of general register base to form a virtual address which can spec- 
ify an arbitrary byte. It reads bytes only from the doubleword in mem- 
ory which contains the specified starting byte. From one to eight bytes 
will be loaded, depending on the starting byte specified. 

Conceptually, it starts at the specified byte in memory and loads that 
byte into the low-order (right-most) byte of the register; then it pro- 
ceeds toward the high-order byte of the doubleword in memory and 
the high-order byte of the register, loading bytes from memory into 
the register until it reaches the high-order byte of the doubleword in 
memory. The most significant (left-most) byte(s) of the register will 
not be changed. 



address 8 
address 







memorv 
(big-endian) 




rfinister 


8 


9 


10 


11 


10 1 "3 


14 15 










before 


A 


B 


C 


D 


E 


F 


G 


H 





1 


2 


3 


4 


5 


6 7 



$24 



LDR $24,4($0) 
after 



register 



B C 1 



3 4 $24 
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I H R Load Doubleword Right L D R 

(continued) 



The contents of general register rt are internally bypassed within the 
processor so that no NOP is needed between an immediately preced- 
ing load instruction which specifies register rt and a following LDR (or 
LDL) instruction which also specifies register rt. 
No address exceptions due to alignment are possible. 
This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruc- 
tion exception. 
Operation: 



64 T: vAddr<-((offset 15 ) 48 ||orfset 15 ..o) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 

pAddr <- pAddr PS izE-i..3 II (P Addr 2..o xor Reverse Endian 3 ) 

if BigEndianMem = 1 then 

pAddrf- pAddr 31 .. 3 ||0 3 

endif 

byte <- vAddr 2 . *or BigEndianCPU 3 

mem <- Loadlviemory (uncached, byte, pAddr, vAddr, DATA) 

GPR[rt] *- mem 63 64 . 8 . byte .. || GPR[rt] 63 _ 8 . byte .. 
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LDR 



Load Doubleword Right 
(continued) 



LDR 



Given a doubleword in a register and a doubleword in memory, the 
operation of LDR is as follows: 



LDR 

Register 

Memory 



A 


B 


C 


D 


E 


F 


G 


H 



1 


J 


K 


L 


M 


N 





P 



vAddr 2 . 


BigEndianCPU = 


BigEndianCPU = 1 


destination 


type 


offset 


destination 


type 


offset 


LEM BEM 


LEM BEM 



1 
2 
3 

4 
5 
6 
7 


I J K L M N P 
A I J K L MN 
A B I J K L M N 
A B C I J K L M 
A B C D I J K L 
A B C D E I J K 
A B C DE F I J 
A B C D E F G I 


7 
6 
5 
4 
3 
2 
1 





1 

2 

3 

4 

5 

6 

7 


ABCDEFGI 
A B C D E F I J 
A B C D E I J K 
A B C D I JKL 
A B C I J K L M 
A B I J K L M N 
A I JKLMNO 
I J K L MN P 



1 
2 
3 

4 
5 
6 

7 


7 
6 
5 
4 
3 
2 
1 




LEM BigEndianMem = 

BEM BigEndianMem = 1 

Type AccessType (see Figure 2-2) sent to memory 

Offset pAddr 2 ..o sent to memory 

Exceptions: 

TLB refill exception 

TLB invalid exception 

Bus error exception 

Address error exception 

Reserved instruction exception (R4000 in 32-bit mode) 
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LH 



Load Halfword 



LH 



31 



26 25 



21 20 



16 15 



LH 
10000 1 



base 



offset 



6 



16 



Format: 

LH rt, offset(base) 
Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the halfword at 
the memory location specified by the effective address are sign-ex- 
tended and loaded into general register rt. 

If the least-significant bit of the effective address is non-zero, an ad- 
dress error exception occurs. 

Operation: 



32 T: vAddr <- ((offset 15 ) 16 || offset 15 .. ) + GPR[base] 

(pAddr, uncached) «- Addressfranslation (vAddr, DATA) 

pAddr <- pAddr PS , ZE _ 1 2 II ( pAddr, xor (ReverseEndian || 0)) 

mem <- LoadMemory (uncached, HALFWORD, pAddr, vAddr, DATA) 

byte <- vAddr-,__o wr (BigEndianCPU || 0) 

GPR[rt] <- (mem 15+8 - byte ) 16 || mem 15+8 *byte..8- byte 

64 T: vAddr «- ((offset 15 ) 48 || offset, 5 „o) + GPR[base] 

(pAddr, uncached) <- Addressfranslation (vAddr, DATA) 
pAddr «- pAddr PS , 2E _ , 3 1| ( pAddr 2 xor (ReverseEndian || 0)) 
mem <- LoadMemory (uncached, HALFWORD, pAddr, vAddr, DATA) 
byte <- vAddr 2 .. xor (BigEndianCPU 2 || 0) 
GPR[rt] *- (mem 15+8 - byte ) 16 || mem 15+8 . byte .. 8 - by1 e 



Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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LHU 



Load Halfword Unsigned 



LHU 





31 26 25 21 20 


16 15 











LHU 
100101 


base 


rt 


offset 






6 5 5 




16 







Format: 



LHU rt, offset(base) 



Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the halfword at 
the memory location specified by the effective address are zero-ex- 
tended and loaded into general register rt. 

If the least-significant bit of the effective address is non-zero, an ad- 
dress error exception occurs. 

Operation: 



32 T: 



64 



vAddr «- ((offset 15 ) 16 || offset 15 .. ) + GPRfbase] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 

pAddr <- pAddrp S | 2E _., 2 || ( pAdd r1JD xor (ReverseEndian || 0)) 

mem «- LoadMemory (uncached, HALFWORD, pAddr, vAddr, DATA) 

byte <- vAddr! xor (BigEndianCPU || 0) 

GPR[rt] <-0 16 |t mem 15+8 . byte .. 8 . byte 

vAddr «- ((otfset 15 ) 48 || otfset 15 .. ) + GPR[base] 

(pAddr, uncached) «- AddressTranslation (vAddr, DATA) 

pAddr «- pAddr PSIZE _ •, 3 || ( pAddr 2 . xor (ReverseEndian 2 || 0)) 

mem «- LoadMemory (uncached, HALFWORD, pAddr, vAddr, DATA) 

i_- _i- . . . a -j-i- /r*: — r- — ju n Ann2n ni 

uyie <— VMUUI2..0 AUI VDigciivjiaiiv^ru || \J) 

GPR[rt] <-0 48 || mem 15+8 - byte .. 8 . byte 



Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus Error exception 
Address error exception 
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LL 



Load Linked 



LL 




Format: 



LL rt, offset(base) 



Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the word at the 
memory location specified by the effective address are loaded into 
general register rt. In 64-bit mode, the loaded word is sign-extended. 
This instruction implicitly performs a SYNC operation; all loads and 
stores to shared memory fetched prior to the LL must access memory 
before the LL, and loads and stores to shared memory fetched subse- 
quent to the LL must access memory after the LL. 
The processor begins checking the accessed word for modification by 
other processors and devices. 

Load Linked and Store Conditional can be used to atomically update 
memory locations: 



L1: 




LL 


T1.(T0) 


ADD 


T2.T1.1 


SC 


T2, (TO) 


BEQ 


T2, 0, L1 


NOP 





This atomically increments the word addressed by TO. Changing the 
ADD to an OR changes this to an atomic bit set. 
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I I Load Linked I I 

'— ■— (continued) ^*™ 



The operation of LL is undefined if the addressed location is uncached 
and, for synchronization between multiple processors, the operation 
of LL is undefined if the addressed location is noncoherent. Excep- 
tions also cause SC to fail, so persistent exceptions must be avoided. 

This instruction is available in User mode, and it is not necessary for 
CPO to be enabled. 

If either of the two least-significant bits of the effective address is non- 
zero, an address error exception takes place. 

Operation: 



32 T: vAddre-((offset 15 ) 16 ||offset 15 ..o) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
mem «- LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 
GPR[rt] <-mem 
LLbit «-l 
SyncOperation() 

64 T: vAddr «- ((offset 15 ) 48 || offeet 15 .. ) + GPR[base] 

(pAddr, uncached) «- AddressTranslation (vAddr, DATA) 

pAddr <- PAddrpsizg., 3 || (pAddr 2 xor (ReverseEndian || 2 ) 

mem <- LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 

byte *- vAddr 2 .. xor (BigEndianCPU || 2 ) 

GPR[rt] <- (mem 31+8 . byte ) 32 || mem 31+8 . byte _ 8 - byte 

LLbit <-1 

SyncOperation() 



cxcepuons: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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LLD 



Load Linked Doubleword 



LLD 




Format: 



LLD rt, offset(base) 



Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the doubleword 
at the memory location specified by the effective address are loaded 
into general register rt. 

This instruction implicitly performs a SYNC operation; all loads and 
stores to shared memory fetched prior to the LLD must access memo- 
ry before the LLD, and loads and stores to shared memory fetched 
subsequent to the LLD must access memory after the LLD. 
The processor begins checking the accessed doubleword for modifica- 
tion by other processors and devices. 

Load Linked Doubleword and Store Conditional Doubleword can be 
used to atomically update memory locations: 



LI: 




LLD 


T1,(T0) 


ADD 


T2.T1.1 


SCD 


T2, (TO) 


BEQ 


T2, 0, L1 


NOP 





This atomically increments the word addressed by TO. Changing the 
ADD to an OR changes this to an atomic bit set. 



R4000 User's Manual-Preliminary 



A-95 



Appendix A 

I I r\ Load Linked Doubleword LLD 

L-l-l-' (continued) i-*-fc* 



The operation of LLD is undefined if the addressed location is un- 
cached and, for synchronization between multiple processors, the op- 
eration of LLD is undefined if the addressed location is noncoherent. 
Exceptions also cause SCD to fail, so persistent exceptions must be 
avoided. 

This instruction is available in User mode, and it is not necessary for 
CPO to be enabled. 

If any of the three least-significant bits of the effective address are non- 
zero, an address error exception takes place. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruc- 
tion exception. 
Operation: 



64 T: vAddr «- ((offset 15 ) 48 || offset 15 .. ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
mem «- LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 

GPR[rt] «- mem 
LLbit <-1 
SyncOperation() 



Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 



Reserved instruction exception (R4000 in 32-bit mode) 
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LUI 



Load Upper Immediate 



LUI 



31 



26 25 



21 20 



16 15 



LUI 
001111 




00000 



immediate 



6 



16 



Format: 

LUI rt, immediate 
Description: 

The 16-bit immediate is shifted left 16 bits and concatenated to 16 bits 
of zeros. The result is placed into general register rt. In 64-bit mode, 
the loaded word is sign-extended. 

Operation: 



32 



GPR[rt] «- immediate || 16 



64 T: GPR[rt] <- (immediate^) 1 " || immediate || 



i16 



Exceptions: 

None 
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LW 



Load Word 



LW 



31 



26 25 



21 20 16 15 



LW 
100011 



base 



offset 



16 



Format: 

LW rt, offset(base) 
Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the word at the 
memory location specified by the effective address are loaded into 
general register rt. In 64-bit mode, the loaded word is sign-extended. 
If either of the two least-significant bits of the effective address is non- 
zero, an address error exception occurs. 

Operation: 



32 T: vAddr *- ((offset! 5 ) 16 1| offset 15 „ ) + GPR[base] 

(pAddr, uncached) «- AddressTranslation (vAddr, DATA) 
mem <- LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 
GPR[rt] «- mem 

64 T: vAddr *- ((offset 15 ) 48 || otfset 15 . ) + GPRfbase] 

(pAddr, uncached) «- AddressTranslation (vAddr, DATA) 
pAddr «- pAddr PS izE-i..3 II (P Addr 2..o xor (ReverseEndian || 2 ) 
mem «- LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 
byte «- vAddr 2 xor (BigEndianCPU || 2 ) 

.. . .19 ,. 

UKK[rtJ <- (memsn-s-byteJ"* II mem 31+8*byte..8*byte 



Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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LWCz 



Load Word To Coprocessor 



LWCz 



31 



26 25 



21 20 



LWCz 
1 1 x x* 



base 





16 15 









rt 


offset 




5 




16 







Format: 

LWCz rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The processor reads a word 
from the addressed memory location, and makes the data available to 
coprocessor unit z. The manner in which each coprocessor uses the 
data is defined by the individual coprocessor specifications. 
If either of the two least-significant bits of the effective address is non- 
zero, an address error exception occurs. 
This instruction is not valid for use with CPO. 



*See the table "Opcode Bit Encoding" on next page, or "CPU Instruc- 
tion Opcode Bit Encoding" at the end of Appendix A. 
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LWCz 



Load Word To Coprocessor 
(continued) 



LWCz 



Operation: 



32 T: 



64 



vAddr <- ((offset 15 ) 16 || otfset 15 .. ) + GPR[base] 
(pAddr, uncached) <- Addressf ranslation (vAddr, DATA) 

byte «- vAddr 1 o 

mem <- LoadMemory (uncached,DOUBLEWORD,pAddr,vAddr,DATA) 

COPzLW (it, mem) 

vAddr <- ((offset 15 ) 48 1| offset 15 ) + GPR[base} 
(pAddr, uncached)<- AddressTranslation(vAddr, DATA) 
pAddr <r- pAddr PS i ZE .-| 3 1| (pAddr 2 *or (ReverseEndian || 2 ) 
mem «- LoadMemory ("uncached, WORD, pAddr, vAddr, DATA) 
byte «- vAddr 2 *or (BigEndianCPU || 2 ) 
COPzLW (byte! rt, mem) 



Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
Coprocessor unusable exception 

Opcode Bit Encoding: 



LWCZ Bit #31 30 29 28 27 26 





LWC1 


1 


1 











1 




Bit 


#31 30 29 28 27 26 





LWC2 


1 


1 








1 




Bit 


#31 30 29 28 27 26 





LWC3 


1 


1 








1 I 1 






Opcode Coprocessor Unit Number 
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LWL 



Load Word Left 



LWL 



31 26 


25 21 


20 


16 


15 







LWL 
100010 


base 


rt 


offset 


6 


5 


5 






16 





Format: 



LWL rt, offset(base) 



Description: 

This instruction can be used in combination with the LWR instruction 
to load a register with four consecutive bytes from memory, when the 
bytes cross a boundary between two words. LWL loads the left por- 
tion of the register from the appropriate part of the high-order word; 
LWR loads the right portion of the register from the appropriate part 
of the low-order word. 

The LWL instruction adds its sign-extended 16-bit offset to the con- 
tents of general register base to form a virtual address which can spec- 
ify an arbitrary byte. It reads bytes only from the word in memory 
which contains the specified starting byte. From one to four bytes will 
be loaded, depending on the starting byte specified. In 64-bit mode, 
the loaded word is sign-extended. 

Conceptually, it starts at the specified byte in memory and loads that 
byte into the high-order (left-most) byte of the register; then it pro- 
ceeds toward the low-order byte of the word in memory and the low- 
order byte of the register, loading bytes from memory into the register 
until it reaches the low-order byte of the word in memory. The least- 
significant (right-most) byte(s) of the register will not be changed. 



address 4 
address 



memory 
(big-endian) 



4 


5 


6 


7 





1 


2 


3 



before 



LWL $24,1 ($0) 



register 




$24 
$24 


A B | C 


D I 






1 2 3 


D 
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lwl L t:rr lwl 



The contents of general register rt are internally bypassed within the 
processor so that no NOP is needed between an immediately preced- 
ing load instruction which specifies register rt and a following LWL 
(or LWR) instruction which also specifies register rt. 
No address exceptions due to alignment are possible. 
Operation: 



32 T: vAddr <- ((offset 15 ) 16 1| offset 15 „ ) + GPRfbase] 

(pAddr, uncached) «- AddressTranslation (vAddr, DATA) 
pAddr <- pAddrpsEg.^ || (pAddr 1-0 xor ReverseEndian 2 ) 

if BigEndianMem = then 

pAddr <- pAddrpscE^L^ II 2 

endif 

byte <- vAdd^ .. xor BigEndianCPU 2 

mem <- LoadMemory (uncached, byte, pAddr, vAddr, DATA) 

GPR[rt] <- mem 7+8n)yt e..o II GPR[rt]23-8*byte..o 
64 T: vAddr <- ((offset^) 48 1| offset 15 .. ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <- pAddr PS |2E-i..3 II (pAddr 2 .. xor ReverseEndian 3 ) 
if BigEndianMem = then 

pAddr <- pAddr PS , ZE _i .. 3 II 3 
endif 

byte <- vAddr^.0 xor BigEndianCPU 2 
word <- vAddr 2 xor BigEndianCPU 

i m._ /..___«U~^ A II U, ,*~ >A>4/4r yiArlrir HATA\ 

mem t— i_uauivieinuiy ^uiiv/a^ncu, u |j uyvc, prtuui, vnuui, i^r-wr-./ 
temp *- mem 3 i + 32Vord-8*byte..32*word II GPR[rt] 2 3-8*byte..O 

GPR[rt] <- (terr^) 32 || temp 
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LWL 



Load Word Left 
(continued) 



LWL 



Given a doubleword in a register and a doubleword in memory, the 
operation of LWL is as follows: 



LWL 

Register 

Memory 




















A 


B 


C 


D 


E 


F 


G 


H 






















1 


J 


K 


L 


M 


N 





P 























vAddr 2 ..o 


BigEndianCPU = 


BigEndianCPU = 1 


destination 


type 


offset 


destination 


type 


offset 


LEM BEM 


LEM BEM 



1 
2 
3 

4 
5 
6 
7 


S SSSPFGH 
S SSSOPGH 
S SSSNOPH 
S SSSMNOP 
SSSSLFGH 
S SSSKLGH 
SSSSJKLH 
S S S S I J K L 



1 
2 
3 

1 
2 
3 


7 
6 
5 
4 
4 3 
4 2 
4 1 
4 


S S S S I J K L 
SSSSJKLH 
SSSSKLGH 
SSSSLFGH 
SSSSMNOP 
SSSSNOPH 
SSSSOPGH 
SSSSPFGH 


3 
2 
1 

3 
2 
1 



4 
4 1 
4 2 
4 3 
4 
5 
6 
7 



LEM BigEndianMem = 

BEM BigEndianMem = 1 

Type AccessType (see Figure 2-2) sent to memory 

Offset pAddr 2 _. sent to memory 

S sign-extend of destinations! 

Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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LWR 



Load Word Right 



LWR 



31 



26 25 



21 20 



16 15 



LWR 
100110 



base 



offset 



6 



16 



Format: 



LWR rt, offset(base) 



Description: 

This instruction can be used in combination with the LWL instruction 
to load a register with four consecutive bytes from memory, when the 
bytes cross a boundary between two words. LWR loads the right por- 
tion of the register from the appropriate part of the low-order word; 
LWL loads the left portion of the register from the appropriate part of 
the high-order word. 

The LWR instruction adds its sign-extended 16-bit offset to the con- 
tents of general register base to form a virtual address which can spec- 
ify an arbitrary byte. It reads bytes only from the word in memory 
which contains the specified starting byte. From one to four bytes will 
be loaded, depending on the starting byte specified. In 64-bit mode, 
the loaded word is sign-extended. 

Conceptually, it starts at the specified byte in memory and loads that 
byte into the low-order (right-most) byte of the register; then it pro- 
ceeds toward the high-order byte of the word in memory and the 
high-order byte of the register, loading bytes from memory into the 
register until it reaches the high-order byte of the word in memory. 
The most significant (left-most) byte(s) of the register will not be 
changed. 



address 4 
address 


memory 
(big-endian) 






register 




$24 


4.1 5 


6 


7 


before 


ABC 


D 


0\ 1 


2 


3 








LV 


VR $24 ,4 ($0) 


~~"~~— — i*- after 


ABC 


4 
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lwr Loa ?rrr lwr 



(continued) 



The contents of general register rt are internally bypassed within the 
processor so that no NOP is needed between an immediately preced- 
ing load instruction which specifies register rt and a following LWR 
(or LWL) instruction which also specifies register rt. 
No address exceptions due to alignment are possible. 
Operation: 



32 T: vAddr<-((otfset 15 ) 16 ||offset 15 .. ) + GPR[base] 

(pAddr, uncached) «- AddressTranslation (vAddr, DATA) 

pAddr «- pAddrpsiz^! .2 II (pAdd^ xor ReverseEndian 2 ) 

if BigEndianMem = then 

pAddr*- pAddr PS | 2E _ 31 .. 2 ||0 

endif 

byte <- vAddr, xor BigEndianCPU 2 

mem <- LoadMemory (uncached, byte, pAddr, vAddr, DATA) 

GPR[rt] «- mem 31 32 -8-byte..o II GPR[rf] 31 _e' b yte..o 
64 T: vAddr «- ((offset 15 ) 48 1| offset 15 .. ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <- pAddrpsizE.L.3 || (pAddr 2 .. xor ReverseEndian 3 ) 
if BigEndianMem = 1 then 

pAddr<- pAddr PSiZE _ 31 .. 3 ||0 3 
endif 

byte «- vAddr 1 o xorBigEndianCPU 2 
word 4- vAddr 2 xor BigEndianCPU 
mem «- LoadMemory (uncached, || byte, pAddr, vAddr, DATA) 

temp <- GPR[rt] 3 i..32-8*byte..O II Tie |T, 31+32*word-32*word+8-byte 

GPR[rt] «- (temp 31 ) 32 || temp 
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LWR 



Load Word Right 
(continued) 



LWR 



Given a word in a register and a word in memory, the operation of 
LWR is as follows: 



LWR 

Register 

Memory 




















A 


B 


C 


D 


E 


F 


G 


H 






















1 


J 


K 


L 


M 


N 





P 























vAddr 2 


BigEndianCPU = 


BigEndianCPU = 1 


destination 


type 


offset 


destination 


type 


offset 


LEM BEM 


LEM BEM 



1 
2 
3 
4 
5 
6 
7 


S SSSMNOP 
S SSSEMNO 
S SSSEFMN 
SSSSEFGM 
S S S S I J K L 
S S S S E I J K 
S S S S E F I J 
S S S S E F G I 



1 
2 
3 

1 
2 
3 


4 

1 4 

2 4 

3 4 

4 

5 

6 

7 


S S S S E F G I 
S S S S E F I J 
S S S S E I J K 
S S S S I J K L 
SSSSEFGM 
SSSSEFMN 
S S S S E MN 
S S S S MN P 



1 
2 
3 

1 
2 
3 


7 
6 
5 
4 
3 4 
2 4 
1 4 
4 



LEM BigEndianMem = 

BEM BigEndianMem = 1 

Time. AccessTvoe (see Fieure 2-2) sent to memorv 

Offset pAddr2..0 sent to memory 

S sign-extend of destination^ 

Exceptions: 

TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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1 IA/M Load Word Unsigned 




LWU 


31 26 25 21 20 16 15 







LWU 
101111 


base 


rt 


offset 


6 5 5 


16 





Format: 

LWU rt, offset(base) 
Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of the word at the 
memory location specified by the effective address are loaded into 
general register rt. The loaded word is zero-extended. 

If either of the two least-significant bits of the effective address is non- 
zero, an address error exception occurs. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruc- 
tion exception. 

Operation: 



64 T: vAddr <- ((ofiset 15 ) 48 || offset 15 . ) + GPRfbase] 

(pAddr, uncached) «- AddressTranslation (vAddr, DATA) 
pAddr <r- pAddrpsiz^! 3 || (pAddr 2 xor (ReverseEndian || 2 ) 
mem <- LoadMemory (uncached, WORD, pAddr, vAddr, DATA) 
byte <- vAddr 2 .. xor (BigEndianCPU || 2 ) 
GPR[rt] «- 32 || mem 31+e . byte . 8 . byte 



Exceptions: 

TLB refill exception 

TLB invalid exception 

Bus error exception 

Address error exception 

Reserved instruction exception (R4000 in 32-bit mode) 
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MFCO 



Move From 
System Control Coprocessor 



MFCO 



31 



26 25 21 20 16 15 11 10 



COP0 
10000 



MF 
00000 



rd 



000 0000 0000 



11 



Format: 

MFCO rt,rd 

Description: 

The contents of coprocessor register rd of the CP0 are loaded into 

general register rt. 
Operation: 



32 T: data «- CPR[0,rd] 
T+1: GPR[rt] «- data 

64 T: data «- CPR[0,rd] 

T+1: GPR[rt] «- (data 31 ) 32 || data 31 . 



Exceptions: 

Coprocessor unusable exception 
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MFCz 



Move From Coprocessor 



MFCz 





31 26 25 21 20 16 15 11 10 






COPz 
1 x x* 


MF 
00000 


it 


rd 



000 0000 0000 






6 5 5 5 11 





Format: 

MFCz rt,rd 

Description: 

The contents of coprocessor register rd of coprocessor z are loaded into 
general register rt . 

Execution of the instruction referencing coprocessor 3 causes a re- 
served instruction exception, not a coprocessor unusable exception. 

Operation: 



32 


T: data <- CPR[z,rd] 




T+1: GPR[rt] <- data 


64 


T: if rd = 




data <r- CPR[z,rd 4 .. 1 || 0] 31 .. 




else 




data <- CPR[z,rd 4 .. 1 || Ofa 32 




endif 



Exceptions: 

Coprocessor unusable exception 

Reserved instruction exception (coprocessor 3) 



•See the table "Opcode Bit Encoding" on next page, or "CPU Instruc- 
tion Opcode Bit Encoding" at the end of Appendix A. 
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MFCz 



Move From Coprocessor 
(continued) 



MFCz 



Opcode Bit Encoding: 



Opcode 



MFCz Bit 

MFCO 


#31 


30 


29 


28 


27 


26 


25 


24 


23 


22 


21 








1 































Bit 


#31 30 


29 


28 


27 


26 


25 24 


23 


22 21 





MFC1 





1 











1 




















Bit #31 


30 


29 


28 


27 


26 


25 


24 


23 


22 


21 





MFC2 





1 








1 






















Bit 
MFC3 


#31 



30 


29 


28 27 


26 


25 


24 


23 


22 21 





1 








1 


1 



















V 






J 


V. 




^ S 



Coprocessor Unit Number 



Coprocessor Suboperation 
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MFHI 



Move From HI 



MFHI 



31 



26 25 



16 15 



11 10 



6 5 



SPECIAL 
000000 



00 0000 0000 


rd 



00000 


MFHI 
01 0000 



10 



6 



Format: 

MFHI rd 
Description: 

The contents of special register HI are loaded into general register rd. 
To ensure proper operation in the event of interruptions, the two in- 
structions which follow a MFHI instruction may not be any of the in- 
structions which modify the HI register: MULT, MULTU, DIV, DIVU, 
MTHL DMULT, DMULTU, DDIV, DDIVU. 

Operation: 



32,64 



GPR[rd] <- HI 



Exceptions: 

None 



R4000 User's Manual-Preliminary 



A-Ul 



Appendix A 



MFLO 



Move From Lo 



MFLO 



31 



26 25 



16 15 



11 10 



6 5 



SPECIAL 
000000 



00 0000 0000 



rd 





00000 



MFLO 
010010 



10 



6 



Format: 



MFLO rd 



Description: 

The contents of special register LO are loaded into general register rd. 
To ensure proper operation in the event of interruptions, the two in- 
structions which follow a MFLO instruction may not be any of the in- 
structions which modify the LO register: MULT, MULTU, DIV, DIVU, 
MTLO, DMULT, DMULTU, DDIV, DDIVU. 



Operation 




Exceptions: 

None 
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MTCO 



Move To 
System Control Coprocessor 



MTCO 



31 



26 25 



21 20 



16 15 



11 10 



COP0 
010000 


MT 
001 00 


rt 


rd 



000 0000 00 00 



6 



11 



Format: 

MTCO rt,rd 
Description: 

The contents of general register rt are loaded into coprocessor register 
rdoftheCPO. 

Because the state of the virtual address translation system may be al- 
tered by this instruction, the operation of load, store instructions and 
TLB operations immediately prior to and after this instruction are un- 
defined. 
Operation: 



32, 64 T: 

T+1: 



data *- GPR[rt] 
CPR[0,rd] <- data 



Exceptions: 

Coprocessor unusable exception 
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MTCz 



Move To Coprocessor 



MTCz 



31 



26 25 



21 20 



16 15 



11 10 



COPz 
01 X X* 



MT 
001 00 



rd 



000 0000 0000 



6 



11 



Format: 

MTCz rt, rd 

Description: 

The contents of general register rt are loaded into coprocessor register 
rd of coprocessor z. Execution of the instruction referencing coproces- 
sor 3 causes a reserved instruction exception, not a coprocessor unus- 
able exception. 
Operation: 



32 


T: 




T+1: 


64 


T: 




T+1: 



data «- GPR[rt] 
CPR[z,rd] <- data 
data<-GPR[rt] 31 .. 
if rd « 

CPR[z,rd 4 .. 1 || 0] <-CPR[z, rd 4 ..i || O^.^ || data 

SlSG 

data «- CPR[z,rd 4 .. 1 || 0]e3..32 
endif 



Exceptions: 

Coprocessor unusable exception 

Reserved instruction exception (coprocessor 3) 

♦Opcode Bit Encoding: 



., Bit #31 


30 


29 


28 


27 


26 


25 


24 


23 


22 


21 





Z 

C0P0 





1 




















1 










Bit 


#31 30 


29 


28 


27 


26 


25 


24 


23 


22 21 





C0P1 





1 











i 1 








1 








I 



Bit #31 


30 


29 


28 


27 


26 


25 


24 


23 


22 


21 





C0P2 





1 








1 











1 










Bit 


#31 


30 


29 


28 


27 


26 


25 


24 


23 


22 21 





C0P3 





1 








1 


1 








1 







> 





Opcode I Coprocessor Suboperation 
Coprocessor Unit Number 
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MTHI 


Move To HI 


MTHI 


31 


26 25 21 20 


65 


SPECIAL 
000000 


rs 



000 000000000000 


MTHI 
01 0001 


6 


5 15 


6 


Format: 


MTHI 


rs 









Description: 

The contents of general register rs are loaded into special register HI. 
If a MTHI operation is executed following a MULT, MULTU, DIV, or 
DIVU instruction, but before any MFLO, MFHI, MTLO, or MTHI in- 
structions, the contents of special register LO are undefined. 

Operation: 



32,64 T-2: HI «- undefined 
T-1: HI <- undefined 
T: HI <- GPR[rs] 



Exceptions: 

None 
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MT LO Move To L0 


MTLO 


31 26 25 21 20 6 5 


SPECIAL 
000000 


rs 



000000000000000 


MTLO 
010011 


6 5 15 


6 



Format: 

MTLO rs 
Description: 

The contents of general register rs are loaded into special register LO 
If a MTLO operation is executed following a MULT, MULTU, DIV, or 
DIVU instruction, but before any MFLO, MFHI, MTLO, or MTHI in- 
structions, the contents of special register HI are undefined. 

Operation: 



32,64 


T-2: 


LO 


f- undefined 




T-1: 


LO 


<- undefined 




T: 


LO 


«- GPR[rs] 



Exceptions: 

None 
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MULT 



Multiply 



MULT 





31 26 25 




21 20 


16 15 6 5 






SPECIAL 
000000 


rs 


rt 



00 0000 0000 


MULT 
011000 






6 


5 


5 


10 6 





Format: 



MULT re, rt 



Description: 

The contents of general registers rs and rt are multiplied, treating both 
operands as 32-bit 2's-complement values. No integer overflow ex- 
ception occurs under any circumstances. In 64-bit mode, the operands 
must be valid 32-bit, sign-extended values. 

When the operation completes, the low-order word of the double re- 
sult is loaded into special register LO, and the high-order word of the 
double result is loaded into special register HI. 
If either of the two preceding instructions is MFHI or MFLO, the re- 
sults of these instructions are undefined. Correct operation requires 
separating reads of HI or LO from writes by a minimum of two other 
instructions. 
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MULT 



Multiply 
(continued) 



MULT 



Operation: 



32 


1-2: 


LO <- undefined 
HI <- undefined 




T-1: 


LO «- undefined 
HI <- undefined 




T: 


t t- GPR[rs] * GPR[rt] 
LO<-t 31 .. 

H l<— 163..32 


64 


1-2: 


LO <- undefined 
HI <- undefined 




T-1: 


LO «- undefined 
HI «- undefined 




T: 


t «- GPR[rs] 31 .o * GPR 
LO<-(t 31 ) 32 ||t 31 ..o 

Hl«- (tes) 32 II t63..32 


Exceptions: 






None 
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MULTU 



Multiply Unsigned 



MULTU 



31 



26 25 



21 20 



16 15 



6 5 



SPECIAL 
000000 



rs 



00 0000 0000 



MULTU 
01 1 001 



10 



6 



Format: 

MULTU rs,rt 

Description: 

The contents of general register rs and the contents of general register 
rt are multiplied, treating both operands as unsigned values. No over- 
flow exception occurs under any circumstances. In 64-bit mode, the 
operands must be valid 32-bit, sign-extended values. 
When the operation completes, the low-order word of the double re- 
sult is loaded into special register LO, and the high-order word of the 
double result is loaded into special register HI. 
If either of the two preceding instructions is MFHI or MFLO, the re- 
sults of these instructions are undefined. Correct operation requires 
separating reads of HI or LO from writes by a minimum of two in- 
structions. 
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MULTU 



Multiply Unsigned 
(continued) 



MULTU 



Operation: 



32 


T-2: 


LO <- undefined 
HI <- undefined 




T-1: 


LO <- undefined 
HI <- undefined 




T: 


t<-(0||GPR[rs])MO||GPR[rt]) 

LO*-t 31 ..o 

HI <— t63..32 


64 


T-2: 


LO <- undefined 
HI <- undefined 




T-1: 


LO <- undefined 
HI «- undefined 




T: 


t 4— Co || GPR[rs] 31 .. ) * (0 || GPRlrt] 31 .. ) 
LO *- (t 3 i) 32 l|t 3 i..o 
HI <- (tea) 32 II te3..32 


Exceptions: 






None 
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NOR 



31 



26 25 



21 20 



Nor 



16 15 11 10 



NOR 



6 5 



SPECIAL 
000000 



rs 



rd 




00000 



NOR 
100111 



6 



6 



Format: 

NOR id, rs, rt 
Description: 

The contents of general register rs are combined with the contents of 
general register rt in a bit-wise logical NOR operation. The result is 
placed into general register rd. 

Operation: 



32, 64 T: GPR[rd] «- GPR[rs] nor GPR[rt] 



Exceptions: 

None 
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OR 



31 



26 25 



Or 



21 20 16 15 



OR 



11 10 



6 5 



SPECIAL 
000000 



rs 



rd 




00000 



OR 
100101 



Format: 

OR rd, rs, rt 
Description: 

The contents of general register rs are combined with the contents of 
general register rt in a bit-wise logical OR operation. The result is 
placed into general register rd. 

Operation: 



32,64 T: GPR[rd] «- GPR[rs] or GPR[rt] 



Exceptions: 

None 
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Q D 1 Or Immediate 




ORI 


31 26 25 21 20 16 15 







ORI 
001101 


rs 


rt 


immediate 


6 5 5 


16 





Format: 

ORI rt, rs, immediate 
Description: 

The 16-bit immediate is zero-extended and combined with the contents 
of general register rs in a bit-wise logical OR operation. The result is 
placed into general register rt. 

Operation: 



32 T: GPR[rt] <- GPR[rs] 3 i.. 16 1! (immediate or GPR[rs] 15 .. ) 
64 T: GPR[rt] «- GPR[rs] 63 .. 16 || (immediate or GPR[rs] 15 ., ) 



Exceptions: 

None 
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SB 



Store Byte 



SB 



31 



26 25 



21 20 



16 15 



SB 
1 01 000 



base 



offset 



6 



16 



Format: 



SB rt, offset(base) 



Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The least-significant byte of reg- 
ister rt is stored at the effective address. 

Operation: 



32 T: vAddr <- ((offset 16 )' 16 || offset! 5 „ ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 

pAddr <- pAddrpscE.L.2 || (pAddr, xor ReverseEndian 2 ) 

byte «- vAddr, xor BigEndianCPU 2 

data «- GPR[rt] 31 _ 8 . byte ..o II O 8 * 6 ^ 

StoreMemory (uncached, BYTE, data, pAddr, vAddr, DATA) 

64 T: vAddr <- ((offset 15 ) 48 || offset, 5 .. ) +GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 

pAddr <- pAddr PS izE-i..3 II (P Addr 2..o xor ReverseEndian 3 ) 

byte <- vAddr 2 xor BigEndianCPU 3 

data <- GPR[rtj 63 _ 8 . byte ..o || O 8 *^ 

StoreMemory (uncached, BYTE, data, pAddr, vAddr, DATA) 



TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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sc 



Store Conditional 



SC 



31 



26 25 



21 20 



16 15 



SC 
111000 



base 



offset 



16 



Format: 

SC rt, offeet(base) 
Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to f orm a virtual address. The contents of general register 
rt are conditionally stored at the memory location specified by the ef- 
fective address. 

This instruction implicitly performs a SYNC operation; loads and 
stores to shared memory fetched prior to the SC must access memory 
before the SC; loads and stores to shared memory fetched subsequent 
to the SC must access memory after the SC. 
If any other processor or device has modified the physical address 
since the time of the previous Load Linked instruction, or if an ERET 
instruction occurs between the Load Linked instruction and this store 
instruction, the store fails and is inhibited from taking place. 
The success or failure of the store operation (as defined above) is indi- 
cated by the contents of general register rt after execution of the in- 
struction. A successful store sets the contents of general register rt to 
1; an unsuccessful store sets it to 0. 

The operation of Store Conditional is undefined when the address is 
different from the address used in the last Load Linked. 
This instruction is available in User mode; it is not necessary foT CP0 
to be enabled. 

If either of the two least-significant bits of the effective address is non- 
zero, an address error exception takes place. 
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qa Store Conditional Qp 

Ow (continued) wV * 



If this instruction should both fail and take an exception, the exception 
takes precedence. 
Operation: 



32 T: vAddr<-((offset 15 ) 16 ||offset 15 ..o) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 

data <-GPR[rt] 

ifLLbitthen 

StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA) 
endif 

GPR[rt] *- 31 || LLbit 
SyncOperation() 

64 T: vAddr «- ((offset 15 ) 48 1| offset 15 .. ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <- pAddrpsizE-1 3 II ( PAddr 2 .. xor(ReverseEndian || 2 )) 
data «- GPR[rt] 63 ^ byte ..o II O 8 *^ 6 
ifLLbitthen 

StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA) 
endif 

GPR[rt] «- 63 1| LLbit 
SyncOperationQ 



Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 



A-126 R4000 User's Manual-Preliminary 



SCD 



Store Conditional Doubleword 



SCD 



31 



26 25 



21 20 



16 15 



SCD 
111100 



base 



offset 



16 



Format: 

SCD rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of general register 
rt are conditionally stored at the memory location specified by the ef- 
fective address. 

This instruction implicitly performs a SYNC operation; loads and 
stores to shared memory fetched prior to the SCD must access memo- 
ry before the SCD; loads and stores to shared memory fetched subse- 
quent to the SCD must access memory after the SCD. 
If any other processor or device has modified the physical address 
since the time of the previous Load Linked Doubleword instruction, 
or if an ERET instruction occurs between the Load Linked Double- 
word instruction and this store instruction, the store fails and is inhib- 
ited from taking place. 

The success or failure of the store operation (as defined above) is indi- 
cated by the contents of general register rt after execution of the in- 
struction. A successful store sets the contents of general register rt to 
1; an unsuccessful store sets it to 0. 

The operation of Store Conditional Doubleword is undefined when 
the address is different from the address used in the last Load Linked 
Doubleword. 

This instruction is available in User mode; it is not necessary for CPO 
to be enabled. 

If either of the three least-significant bits of the effective address is 
non-zero, an address error exception takes place. 
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CS^I"\ Store Conditional Doubleword C^H 

OV/U (continued) WWl^ 



If this instruction should both fail and take an exception, the exception 
takes precedence. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruc- 
tion exception. 



Operation: 



64 T: vAddr «- ((offset^) 48 1| offset 15 .. ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
data <-GPR[rt] 
if Libit then 

StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA) 

endif 

GPR[rt] «- 63 1| LLbit 

SyncOperation() 



Exceptions: 

TLB refill exception 

TLB invalid exception 

TLB modification exception 

Bus error exception 

Address error exception 

Reserved instruction exception (R4000 in 32-bit mode) 
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SD 



Store Doubleword 



SD 



31 



26 25 



21 20 



16 15 



SD 

111111 



base 



offset 



6 



16 



Format: 

SD rt, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of general register 
rt are stored at the memory location specified by the effective address. 
If either of the three least-significant bits of the effective address are 
non-zero, an address error exception occurs. 

This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruc- 
tion exception. 

Operation: 



64 T: 



vAddr «- ((offset 15 ) 48 || offeet 16 „ ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 

data «- GPR[rt] 

StoreMemory (uncached, DOUBLEWORD, data, pAddr, vAddr, DATA) 



Exceptions: 

TLB refill exception 

TLB invalid exception 

TLB modification exception 

Bus error exception 

Address error exception 

Reserved instruction exception (R4000 in 32-bit user mode 

R4000 in 32-bit supervisor mode) 
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SDCZ Store Doubleword From Coprocessor SDCZ 



31 26 


25 21 


20 


16 


15 







SDCz 
1 1 1 1 X X* 


base 


rt 


offset 


6 


5 


5 






16 





Format: 



SDCz rt, offset(base) 



Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. Coprocessor unit z sources a 
doubleword, which the processor writes to the addressed memory lo- 
cation. The data to be stored is defined by individual coprocessor 
specifications. 

If any of the three least-significant bits of the effective address are non- 
zero, an address error exception takes place. 
This instruction is not valid for use with CPO. 
This instruction is undefined when the least-significant bit of the 
rt-field is non-zero. 



*See the table, "Opcode Bit Encoding" on next page, or "CPU Instruc- 
tion Opcode Bit Encoding" at the end of Appendix A. 
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Store Doubleword From Coprocessor 
(continued) 



SDCz 



Operation: 



32 T: vAddr <- ((offset 15 ) 16 1| offset 15 .. ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 

data <- COPzSD(rt), 

StoreMemory (uncached, DOUBLEWORD, data, pAddr, vAddr, DATA) 

64 T: vAddr «- ((offset 15 ) 48 1| offset 15 .. ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 

data <- COPzSD(rt), 

StoreMemory (uncached, DOUBLEWORD, data, pAddr, vAddr, DATA) 



Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
Coprocessor unusable exception 

Opcode Bit Encoding: 



SDCz 



Bit #31 
SDC1 



30 29 28 27 26 



1 



Bit #31 


30 


29 


28 


27 


26 





SDC2 


1 


1 


1 


1 


1 








Bit #31 


30 


29 


28 


27 


26 





SDC3 


1 


1 


1 


1 


1 


1 





v 

SD opcode 



Coprocessor Unit Number 
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SDL 



Store Doubleword Left 



SDL 



31 



26 25 



21 20 



16 15 



SDL 
101100 



base 



offset 



16 



Format: 



SDL rt, offeet(base) 



Description: 

This instruction can be used with the SDR instruction to store the con- 
tents of a register into eight consecutive bytes of memory, when the 
bytes cross a boundary between two doublewords. SDL stores the left 
portion of the register into the appropriate part of the high-order dou- 
bleword of memory; SDR stores the right portion of the register into 
the appropriate part of the low-order doubleword. 
The SDL instruction adds its sign-extended 16-bit offset to the contents 
of general register base to form a virtual address which may specify an 
arbitrary byte. It alters only the word in memory which contains that 
byte. From one to four bytes will be stored, depending on the starting 
byte specified. 

Conceptually, it starts at the most-significant byte of the register and 
copies it to the specified byte in memory; then it proceeds toward the 
low-order byte of the register and the low-order byte of the word in 
memory, copying bytes from register to memory until it reaches the 
low-order byte of the word in memory. 
No address exceptions due to alignment are possible. 







mciiiui y 

(big-endian) 






register 


address 8 


8 


9 10 


11 


12 


13 


14 


15 




before ABC 


D 


— - 1 1 1 


address 





1 I 2 


3 


4 


5 


6 


7 


E F G H 



$24 



SWL$24,1($0) 



address 8 
address 



8 


9 


10 


11 


12 


13 


14 


15 





B 


C 


D 


E 


F 


G 


H 



after 
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Qt-%1 Store Do ubleword Left QDI 

OUL (continued) OL/U 



This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruc- 
tion exception. 
Operation: 



64 T: vAddr<- ((offset^) 48 1| offset 15 ..o) + GPRtbasel 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr *- pAddr PS | ZE _i.. 3 || (pAddr 2 .. xor ReverseEndian 3 ) 
If BigEndianMem = then 

pAddr <r- pAddr 31 3 1| 3 
endif 
byte <- vAddr 2 xor BigEndianCPU 3 

data <- n 56 " 8 *^ 6 1| GPR[rt] 63 .. 56 ^. byte 

Storememory (uncached, byte, data, pAddr, vAddr, DATA) 
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SDL 



Store Doubleword Left 
(continued) 



SDL 



Given a doubleword in a register and a doubleword in memory, the 
operation of SWL is as follows: 




vAddr 2 „ 


BigEndianCPU = 


BigEndianCPU = 1 


destination 


type 


offset 


destination 


type 


offset 


LEM BEM 


LEM BEM 



1 
2 
3 
4 
5 
6 
7 


I J K L MN A 
I J K L MN A B 
I J K L M A B C 
I J K L A B C D 
I J KABCDE 
I J A B C D E F 
I A B CD E F G 
A BCDEFGH 



1 
2 
3 

4 
5 
6 
7 


7 
6 

5 
4 
3 
2 
1 



ABCDEFGH 
I A B C D E F G 
I J A B C DE F 
I J KABCDE 
I J KLABCD 
I J K L MA B C 
I J KLMNAB 
I J K L MN A 


7 
6 
5 

4 
3 
2 

1 




1 
2 
3 
4 
5 
6 
7 



LEM BigEndianMem = 

BEM BigEndianMem = 1 

Type AccessType (see Figure 2-2) sent to memory 

Offset pAddr 2 .. sent to memory 

Exceptions: 

TLB refill exception 

TLB invalid exception 

TLB modification exception 

Bus error exception 

Address error exception 

Reserved instruction exception (R4000 in 32-bit mode) 
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SDR 



Store Doubleword Right 



SDR 



31 



26 25 



21 20 



16 15 



SDR 
101101 



base 



offset 



16 



Format: 

SDR rt, offset(base) 

Description: 

This instruction can be used with the SDL instruction to store the con- 
tents of a register into eight consecutive bytes of memory, when the 
bytes cross a boundary between two doublewords. SDR stores the 
right portion of the register into the appropriate part of the low-order 
doubleword; SDL stores the left portion of the register into the appro- 
priate part of the low-order doubleword of memory. 
The SDR instruction adds its sign-extended 16-bit offset to the contents 
of general register base to form a virtual address which may specify an 
arbitrary byte. It alters only the word in memory which contains that 
byte. From one to eight bytes will be stored, depending on the starting 
byte specified. 

Conceptually, it starts at the least-significant (rightmost) byte of the 
register and copies it to the specified byte in memory; then it proceeds 
toward the high-order byte of the register and the high-order byte of 
the word in memory, copying bytes from register to memory until it 
reaches the high-order byte of the word in memory. 
No address exceptions due to alignment are possible. 



memory 
(big-endian) 



address 8 
address 



8 


9 


10 


11 


12 


13 


14 


15 





1 


2 


3 


4 


5 


6 


7 



register 



before 



H $24 



memory 
(big-endian) 



SWR $24,4($0) 



address 8 
address 



8 


9 


10 


11 


12 


13 


14 


15 


E 


F 


G 


H 


4 


5 


6 


7 



after 
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O n O store Doubleword Right Qnn 

oUh (continued) OUn 



This operation is only defined for the R4000 operating in 64-bit mode. 
Execution of this instruction in 32-bit mode causes a reserved instruc- 
tion exception. 
Operation: 



64 T: vAddr «- ((offset 15 ) 48 1| offset 1S .. ) + GPR[base] 
(pAddr, uncached) <- Addi 
pAddr<- pAddr PS izE-i..3 
If BigEndianMem - then 



(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr<- pAddr P sizE-i..3ll(P A ddr 2 ..o xor ReverseEndian 3 ) 



3 



pAddr <— pAddr PS i2E - 31 ..3 il °' 
endif 
byte <- vAddr 1 xor BigEndianCPlr 

data <- GPR[rti 63 -8*byte II 8 * byte 

StoreMemory (uncached, DOUBLEWORD-byte, data, pAddr, vAddr, 
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SDR 



Store Doubleword Right 
(continued) 



SDR 



Given a doubleword in a register and a doubleword in memory, the 
operation of SDR is as follows: 



SDR 




















Register 


A 


B 


C 


D 


E 


F 


G 


H 
























Memory 


1 


J 


K 


L 


M 


N 





P 

























vAddr 2 .. 


BigEndianCPU = 


BigEndianCPU = 1 


destination 


type 


offset 


destination 


type 


offset 


LEM BEM 


LEM BEM 



1 
2 
3 
4 
5 
6 
7 


ABCDEFGH 
BCDEFGHP 
CDEFGHOP 
DEFGHNOP 
EFGHMNOP 
F GHLMNOP 
GHKLMNOP 
HJ KLMNOP 


7 
6 
5 
4 
3 
2 
1 





1 

2 

3 

4 

5 

6 

7 


H J KLMNOP 
GHKLMNOP 
F G H L MNO P 
EF GHMNOP 
DE FGHNOP 
CDEFGHOP 
BCDEFGHP 
ABCDEFGH 




1 
2 
3 

4 
5 
6 

7 


7 
6 
5 
4 
3 
2 
1 




LEM BigEndianMem = 

BEM BigEndianMem = 1 

Type AccessType (see Figure 2-2) sent to memory 

Offset pAddr 2 ..o sent to memory 

Exceptions: 

TLB refill exception 

TLB invalid exception 

TLB modification exception 

Bus error exception 

Address error exception 

Reserved instruction exception (R4000 in 32-bit mode) 
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SH 



Store Halfword 



SH 




Format: 

SH rt, offset(base) 
Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form an unsigned effective address. The least-signifi- 
cant halfword of register rt is stored at the effective address. If the 
least-significant bit of the effective address is non-zero, an address er- 
ror exception occurs. 

Operation: 



32 



64 T: 



vAddr «- ((offset! 5 ) 16 || offset 15 .. ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 

pAddr «- pAddrpsize.-, .2 || (pAdd^ xor (ReverseEndian || 0)) 

bytef-vAddr! xor (BigEndianCPU || 0) 

data <- GPR[rt] 3 i-8-byte..o II 8 * byte 

StoreMemory (uncached, HALFWORD, data, pAddr, vAddr, DATA) 

vAddr <r- ((offset 15 ) 48 || offset 15 .. ) + GPR[base] 

(pAddr, uncached) «- AddressTranslation (vAddr, DATA) 

pAddr <- pAddrpsizE.L.3 || (pAddr 2 .. xor (ReverseEndian 2 1| 0)) 

byte i- vAddr 2 xor (BigEndianCPU 2 || 0) 

data <- GPRtrtles-e-byteJ) II 8 " byte 
i>ioreiviemory (unuauimu, nMi_rvvv^r\Ly, uaia, 



^A^*-Jr- wAHWr HATA\ 



Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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SLL 



Shift Left Logical 



SLL 





31 26 25 


21 20 


16 15 


11 10 6 5 






SPECIAL 
000000 


rs 


rt 


rd 


sa 


SLL 
000000 






6 


5 5 


5 


5 6 





Format: 

SLL rd, rtsa 

Description: 

The contents of general register rt are shifted left by sa bits, inserting 
zeros into the low-order bits. The result is placed in register rd. In 64- 
bit mode, the operand must be a valid sign-extended, 32-bit value. 

Operation: 



32 T 
64 



GPR[rd]<-GPR[rt] 31 _ sa .. ||0 s 



T: s 4- |l sa 

temp «- GPR[rt] 31 . s .. || s 
GPR[rd] «- (temp 31 ) 32 1| temp 



Exceptions: 

None 



R4000 User's Manual-Preliminary 



A-139 



Appendix A 



SLLV 



Shift Left Logical Variable 



SLLV 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 




00000 



rd 




00000 



SLLV 
000100 



Format: 

SLLV rd,rt,rs 
Description: 

The contents of general register rt are shifted left by the number of bits 
specified by the low-order five bits contained as contents of general 
register rs, inserting zeros into the low-order bits. The result is placed 
in register rd. In 64-bit mode, the operand must be a valid sign-ex- 
tended, 32-bit value. 
Operation: 



Exceptions: 

None 



32 T: s «- GP[rs] 4 

GPR[rd]<- GPR[rt] (31 ^ ) .. II s 

64 T: S *- || GP[rs] 4 .. 

temp <- GPR[rt] (31 ^.o || s 
GPR[rd] <- (temp 3 i) 32 || temp 
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SLT 



Set On Less Than 



SLT 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 



rs 



rd 




00000 



SLT 
101010 



Format: 

SLT rd, rs, rt 
Description: 

The contents of general register rt are subtracted from the contents of 
general register rs. Considering both quantities as signed integers, if 
the contents of general register rs are less than the contents of general 
register rt, the result is set to one, otherwise the result is set to zero. 
The result is placed into general register rd. 

No integer overflow exception occurs under any circumstances. The 
comparison is valid even if the subtraction used during the compari- 
son overflows. 
Operation: 



32 



if GPR[rs] < GPR[rt] then 

GPR[rd] «- 31 || 1 
else 



GPR[rd] «- 



32 



64 



endif 

if GPR[rs] < GPR[rt] then 

else 

endif 



GPR[rd] «- 63 || 1 
GPR[rd] <- 64 



Exceptions: 

None 
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SLTI 



Set On Less Than Immediate 



SLTI 



31 26 


25 


21 


20 


16 


15 




SLTI 
01010 


rs 


rt 


immediate 


6 




5 


5 






16 



Format: 

SLTI rt, rs, immediate 

Description: 

The 16-bit immediate is sign-extended and subtracted from the con- 
tents of general register rs. Considering both quantities as signed inte- 
gers, if rs is less than the sign-extended immediate, the result is set to 
one, otherwise the result is set to zero. The result is placed into general 
register rt. 

No integer overflow exception occurs under any circumstances. The 
comparison is valid even if the subtraction used during the compari- 
son overflows. 
Operation: 



32 



64 



if GPR[rs] < (immediate 15 ) 16 1| immediate! 5 „ then 

GPR[rd] «- 31 1| 1 
else 

GPR[rd] <- 32 
endif 

if GPR[rs] < (immediate! 5 ) 48 || immediate! 5 .. then 



,63 



GPR[rd] «- bJ || 1 



else 
endif 



GPR[rd] <- 



.64 



Exceptions: 

None 
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SLTIU 



Set On Less Than 
Immediate Unsigned 



SLTIU 



31 



26 25 



21 20 



16 15 



SLTIU 
00 1011 



rs 



immediate 



16 



Format: 

SLTIU rt, is, immediate 

Description: 

The 16-bit immediate is sign-extended and subtracted from the con- 
tents of general register rs. Considering both quantities as unsigned 
integers, if rs is less than the sign-extended immediate, the result is set 
to one, otherwise the result is set to zero. The result is placed into gen- 
eral register rt 

No integer overflow exception occurs under any circumstances. The 
comparison is valid even if the subtraction used during the compari- 
son overflows. 
Operation: 



32 



T: if (0 || GPR[rs]) < (immediate 15 ) 16 1| immediate-, 5 . then 
GPR[rd] <- 31 || 1 
else 



GPR[rd] <- 



32 



endif 



64 



if (0 || GPR[rs]) < (immediate 15 ) 48 1| immediate 15 .. then 

GPR[rd]<-0 63 ||1 
else 



GPR[rd] <- 



64 



endif 



Exceptions: 

None 
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SLTU 



Set On Less Than Unsigned 



SLTU 





31 26 25 21 20 


16 15 


11 10 6 


5 




SPECIAL 
000000 


rs 


rt 


rd 



00000 


SLTU 
101011 




6 5 5 


5 


5 


6 




Format: 


SLTU rd, rs, rt 













Description: 

The contents of general register rt are subtracted from the contents of 
general register rs. Considering both quantities as unsigned integers, 
if the contents of general register rs are less than the contents of gener- 
al register rt, the result is set to one, otherwise the result is set to zero. 
The result is placed into general register rd. 

No integer overflow exception occurs under any circumstances. The 
comparison is valid even if the subtraction used during the compari- 
son overflows. 
Operation: 



32 



64 



T: 



if (0 || GPR[rs]) 


< Oil GPR[rt] then 
- (jSi || 1 




GPR[rd] < 


else 








GPR[rd] 


e-0 32 


endif 






if (0 || GPR[rs]) 


<0IJGPR[rt]then 

«- 0* 3 1| 1 




GPR[rd] 


else 








GPR[rd] 


(-0 64 


endif 







Exceptions: 

None 
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SRA 



Shift Right Arithmetic 



SRA 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 




00000 



rd 



sa 



SRA 
00011 



6 



Format: 

SRA rd, rt, sa 
Description: 

The contents of general register rt are shifted right by sa bits, sign- 
extending the high-order bits. The result is placed in register rd. In 64- 
bit mode, the operand must be a valid sign-extended, 32-bit value. 

Operation: 



32 T: GPRtrd] «- (GPR[rt] 31 ) sa || GPR[rt] 31 .. sa 

64 T: s <- || sa 

temp <- (GPR[rt] 31 ) s || GPR[rt] 31 .. s 
GPR[rd] <- (temp 31 ) 32 || temp 



Exceptions: 

None 
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S R A V Shift Rlght Arithmetic Variable S R A V 





31 26 


25 21 20 


16 15 


11 10 6 


5 




SPECIAL 
000000 


rs 


rt 


rd 



00000 


SRAV 
0001 1 1 




6 


5 5 


5 


5 


6 




Format: 


SRAV rd, rt, rs 













Description: 

The contents of general register rt are shifted right by the number of 
bits specified by the low-order five bits of general register rs, sign- 
extending the high-order bits. The result is placed in register rd. In 
64-bit mode, the operand must be a valid sign-extended, 32-bit value. 

Operation: 



32 T: s «- GPR[rs] 4 

GPR[rd] <- (GPR[rt] 31 ) s || GPR[rt] 31 . 

64 T: s<- GPR[rs] 4 

temp «- (GPR"lrt] 31 ) s || GPR[rt] 3 i.. s 
GPR[rd] <- (temp 31 ) 32 1| temp 



Exceptions: 

None 
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SRL 



Shift Right Logical 



SRL 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 




00000 



rd 



sa 



SRL 
00001 



6 



Format: 

SRL rd,rt,sa 
Description: 

The contents of general register rt are shifted right by sa bits, inserting 
zeros into the high-order bits. The result is placed in register rd. In 
64-bit mode, the operand must be a valid sign-extended, 32-bit value. 

Operation: 



32 T: GPR[rd]<-0 sa ||GPR[rtl 31 .. sa 

64 T: s <- || sa 

temp<-0 s ||GPR[rt] 31 „ s 
GPR[rd]<-(temp 31 ) 32 ||temp 



Exceptions: 

None 
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SRLV 



Shift Right Logical Variable 



SRLV 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 



rs 



rd 




00000 



SRLV 
0001 1 



6 



6 



Format: 

SRLV rd, rt, rs 

Description: 

The contents of general register rt are shifted right by the number of 
bits specified by the low-order five bits of general register rs, inserting 
zeros into the high-order bits. The result is placed in register rd. In 64- 
bit mode, the operand must be a valid sign-extended, 32-bit value. 

Operation: 



32 T: s <r- GPR[rs] 4 .o 

GPR[rd] «- s || GPR[rt] 31 .. s 

64 T: s 4- GPR[rs] 4 .. 

temp<-0 s ||GPR[rt] 31 .. s 
GPR[rd] «- (temp 31 ) 32 || temp 



Exceptions: 

None 
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SUB 



Subtract 



SUB 





31 26 25 21 20 


16 15 11 10 6 


5 






SPECIAL 
000000 


rs 


rt 


rd 



00000 


SUB 
1 0001 






6 5 5 


5 5 


6 






Format: 


SUB rd, rs, 


rt 













Description: 

The contents of general register rt are subtracted from the contents of 

general register rs to form a result. The result is placed into general 

register rd. In 64rbit mode, the operands must be valid sign-extended, 

32-bit values. 

The only difference between this instruction and the SUBU instruction 

is that SUBU never traps on overflow. 

An integer overflow exception takes place if the carries out of bits 30 

and 31 differ (2's-complement overflow). The destination register rd is 

not modified when an integer overflow exception occurs. 

Operation: 



32 T: GPR[rd] «- GPR[rs] - GPR[rt] 

64 T: temp *- GPR[rs] - GPR[rt] 

GPR[rd] <- (temp 31 ) 32 || temp 31 



Exceptions: 

Integer overflow exception 
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SUBU 



Subtract Unsigned 



SUBU 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



SPECIAL 
000000 



rs 



rd 




00000 



SUBU 
1 0001 1 



6 



Format: 

SUBUrd,rs,rt 

Description: 

The contents of general register rt are subtracted from the contents of 
general register rs to form a result. The result is placed into general 
register rd. In 64-bit mode, the operands must be valid sign-extended, 
32-bit values. 

The only difference between this instruction and the SUB instruction 
is that SUBU never traps on overflow. No integer overflow exception 
occurs under any circumstances. 
Operation: 



32 

64 



T: GPR[rd] «- GPR[rs] - GPR[rt] 

T: temp «- GPR[rs] - GPR[rt] 

GPR[rd] «- (temp 31 ) 32 || temp 31 .. 



Exceptions: 

None 
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sw 



Store Word 



SW 



31 26 


25 21 


20 


16 


15 







SW 
101011 


base 


rt 


offset 



16 



Format: 



SW rt, offsetfbase) 



Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. The contents of general register 
rt are stored at the memory location specified by the effective address. 

If either of the two least-significant bits of the effective address are 
non-zero, an address error exception occurs. 

Operation: 



vAddr <- ((offset 15 ) 16 1| offset 15 .. ) + GPRfbase] ~~~ 

(pAddr, uncached) «- AddressTransiation (vAddr, DATA) 

data «- GPR[rt] 

StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA) 

vAddr «- ((offset^) 48 || offset 15 .. ) + GPR[base] 

(pAddr, uncached) <- AddressTransiation (vAddr, DATA) 

pAddr «- pAddrpsize^ .3 1| (pAddr 2 .. xor (ReverseEndian || 2 ) 

byte <- vAddr 2 xor (BigEndianCPU || 2 ) 

data f- GPR[rti'63. 8 . byte || O 8 *^ 6 

StoreMemory (uncache, WORD, data, pAddr, vAddr DATA) 



32 



64 



Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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SWCZ store Word From Coprocessor S W CZ 



31 



26 25 



21 20 



16 15 



SWCz 
1 1 1 x x* 



base 



offset 



6 



16 



Format: 



SWCz rt, offset(base) 



Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form a virtual address. Coprocessor unit 2 sources a 
word, which the processor writes to the addressed memory location. 
The data to be stored is defined by individual coprocessor 
specifications. This instruction is not valid for use with CPO. If either 
of the two least-significant bits of the effective address is non-zero, an 
address error exception occurs. 

Execution of the instruction referencing coprocessor 3 causes a 
reserved instruction exception, not a coprocessor unusable exception. 

Operation: 



32 



64 



vAddr <- ((offset 15 ) 16 " offset 15 .. ) + GPRfbase] 
(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 

byte «- vAddr-i o 

data *- COPzSW (byte, rt) 

StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA) 

vAddr <- ((offset 15 ) 48 1| offset 1s . ) + GPR[base] 
(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 

pAddr «- pAddrpsize.L.3 || (pAddr 2 .. xor (ReverseEndian || 2 ) 

byte «- vAddr 2 .. xor (BigEndianCPU || 2 ) 

data <- COPzSW (byte.rt) 

StoreMemory (uncache, WORD, data, pAddr, vAddr DATA) 



*See the table "Opcode Bit Encoding" on next page, or "CPU 
Instruction Opcode Bit Encoding" at the end of Appendix A. 



A-152 



R4000 User's Manual-Preliminary 



SWCz 



Store Word From Coprocessor 
(Continued) 



SWCz 



Exceptions: 

TLB refill exception 

TLB invalid exception 

TLB modification exception 

Bus error exception 

Address error exception 

Coprocessor unusable exception 

Reserved instruction exception (coprocessor 3) 



Opcode Bit Encoding: 



SWCz Bit#31 30 29 28 27 26 







SWC1 


1 


1 


1 








1 






Bit 
SWC2 

Bit 


#31 30 29 28 27 26 







1 


1 


1 





1 









#31 30 29 28 27 26 







SWC3 


1 


1 


1 





1 


1 








v~ J 




r Unit Number 






SW opcode 


Coprc 


>cessc 
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SWL 



Store Word Left 



SWL 




Format: 

SWL rt, offset(base) 

Description: 

This instruction can be used with the SWR instruction to store the 
contents of a register into four consecutive bytes of memory, when the 
bytes cross a boundary between two words. SWL stores the left 
portion of the register into the appropriate part of the high-order word 
of memory; SWR stores the right portion of the register into the 
appropriate part of the low-order word. 
The SWL instruction adds its sign-extended 16-bit offset to the 
contents of general register base to form a virtual address which may 
specify an arbitrary byte. It alters only the word in memory which 
contains that byte. From one to four bytes will be stored, depending 
on the starting byte specified. 

Conceptually, it starts at the most-significant byte of the register and 
copies it to the specified byte in memory; then it proceeds toward the 
low-order byte of the register and the low-order byte of the word in 
memory, copying bytes from register to memory until it reaches the 
low-order byte of the word in memory. 
No address exceptions due to alignment are possible. 



address 4 
address 



address 4 
address 



memory 
(big-endian) 



4 


5 


6 


7 





1 


2 


3 



register 



A 


B 


C 


D 



4 


5 


6 


7 





A 


B 


C 




$24 
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QiAf I Store Word Left QIAI I 

wVVL. (Continued) www*- 



Operation: 



32 T: vAddr «- ((offset 15 ) 16 1| offset 15 .. ) + GPRfbase] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddrf- pAddrp S izE-i..2ll(P Addr i..o xor ReverseEndian 2 ) 
If BigEndianMem = then 

pAddr ^-pAddrpsizE.L.2 110 2 
endif 
byte <- vAddr, xor BigEndianCPU 2 

dat a «_ tf^'W* || GPR[rt] 3 i.. 2 4-8-byte 

Storememory (uncached, byte, data, pAddr, vAddr, DATA) 
64 T: vAddr<-((offset 15 ) 48 1| offset i5.. ) + GPR[base] 

(pAddr, uncached) «- AddressTranslation (vAddr, DATA) 
pAddr<- pAddrps^g.^slKpAddr^.o xor ReverseEndian 2 ) 
If BigEndianMem - then 

pAddr <- pAddr 31 2 || 2 
endif 

byte <- vAddr-, o xor BigEndianCPU 2 
if (vAddr 2 xor BigEndianCPU) = then 

data «- 32 || O 24 ^*^ 6 || GPR[rt] 31 ..24-8*byte 

else 

data «- O 24 ' 8 *^ || GPR[rt] 31 ..24-8*byte II 32 

endif 

StoreMemory(uncached, byte, data, pAddr, vAddr, DATA) 
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SWL 



Store Word Left 
(Continued) 



SWL 



Given a doubleword in a register and a doubleword in memory, the 
operation of SWL is as follows: 




vAddr^o 


BigEndianCPU = 


BigEndianCPU = 1 


destination 


type 


offset 


destination 


type 


offset 


LEM BEM 


LEM BEM 



1 
2 
3 
4 
5 
6 
7 


I J K L M N OE 
I J KLMNEF 
I J KLMEFG 
I J K L E F G H 
I J K EMN OP 
I J E F M N OP 
I E F GMN OP 

E FGHMNOP 





1 
2 
3 

1 
2 
3 


7 
6 
5 

4 
4 3 
4 2 
4 1 
4 


EF GHMNOP 
I E F G MN P 
I J E F MNO P 
I J K E MN P 
I J K L E F G H 
I J KLMEFG 
I J KLMNEF 
r J K L M N E 


3 
2 
1 

3 
2 
1 



4 
4 1 
4 2 
4 3 
4 
5 
6 



BigEndianMem = 



Digtndiaiuvieiii = i 



LEM 
BEM 
Type 
Offset 

Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 



AccessType (see Figure 2-2) sent to memory 
pAddr 2 ..o sent to memory 
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SWR 



Store Word Right 



SWR 



31 



26 25 



21 20 



16 15 



SWR 
101110 



base 



offset 



16 



Format: 

SWR rt, offset(base) 
Description: 

This instruction can be used with the SWL instruction to store the 
contents of a register into four consecutive bytes of memory, when the 
bytes cross a boundary between two words. SWR stores the right 
portion of the register into the appropriate part of the low-order word; 
SWL stores the left portion of the register into the appropriate part of 
the low-order word of memory. 

The SWR instruction adds its sign-extended 16-bit offset to the 
contents of general register base to form a virtual address which may 
specify an arbitrary byte. It alters only the word in memory which 
contains that byte. From one to four bytes will be stored, depending 
on the starting byte specified. 

Conceptually, it starts at the least-significant (rightmost) byte of the 
register and copies it to the specified byte in memory; then it proceeds 
toward the high-order byte of the register and the high-order byte of 
the word in memory, copying bytes from register to memory until it 
reaches the high-order byte of the word in memory. 
No address exceptions due to alignment are possible. 



address 4 
address 







memory 
big-endian) 




address 4 


4 


5 


6 


7 


address 





1 


2 


3 



register 



D 


5 


6 


7 





1 


2 


3 



before 



SWR 



after- 



A 


B 


C | D 
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swr sto rr ue r swr 



(Continued) 



Operation: 



32 T: vAddr «- ((offset 15 ) 16 1| offset 15 .. ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr*- pAddrp S | ZE _ 1 .. 2 ||(pAddr 1 ..o xor ReverseEndian* 1 ) 
If BigEndianMem - then 

pAddr «- pAddr PS | ZE _ 31 .^ || 2 
endif 9 

byte <- vAdd^ xor BigEndianCPlr 

data «- GPRIrtisi-envto II ° 8 * byte 

Storememory (uncached, WORD-byte, data, pAddr, vAddr, DATA) 

64 T: vAddr <- ((offset 15 ) 48 1| offset 15 .. ) + GPR[base] 

(pAddr, uncached) *- AddressTranslation (vAddr, DATA) 
pAddr<- pAddrpscE.L.all (pAddr 2 . xor ReverseEndian 3 ) 
If BigEndianMem - then 

pAddr ^-pAdd^^O 2 
endif 

byte«-vAddri.. xor BigEndianCPU 2 
if (vAddr 2 xor BigEndianCPU) - then 

data «- 32 || GPR[rt] 31 ^ byte ..o II 8 * byte 
else 

data f- GPR[rt] 31 ^. byte .. 1| O 8 *^ || 32 
endif 
StoreMemory(uncached, WORD-byte, data .pAddr, vAddr, DATA) 
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SWR 



Store Word Right 
(Continued) 



SWR 



Given a doubleword in a register and a doubleword in memory, the 
operation of SWR is as follows: 




vAddr^o 


BigEndianCPU = 


BigEndianCPU = 1 


destination 


type 


offset 


destination 


type 


onset 


LEM BEM 


LEM BEM 





I JKLEFGH 


3 


4 


HJ KLMNOP 





7 


1 


I J K L F G H P 


2 


1 4 


GHKLMNOP 


1 


6 


2 


I J K L G H OP 


1 


2 4 


FGHLMNOP 


2 


5 


3 


I JKLHNOP 





3 4 


EFGHMNOP 


3 


4 


4 


EFGHMNOP 


3 


4 


I J K L H N P 





3 4 


5 


F GHLMNOP 


2 


5 


I J K L G H P 


1 


2 4 


6 


GHKLMNOP 


1 


6 


I J K L F GH P 


2 


1 4 


7 


HJKLMNOP 





7 


I J K L E F G H 


3 


4 



LEM 
BEM 
Type 
Offset 

Exceptions: 

TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 



BigEndianMem = 

BigEndianMem = 1 

AccessType (see Figure 2-2) sent to memory 

pAddr 2 ..o sent to memory 
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SYNC 



Synchronize 



SYNC 



31 



26 25 



6 5 



SPECIAL 
000000 



0000 0000 0000 0000 0000 



SYNC 
001111 



6 



20 



6 



Format: 

SYNC 
Description: 

The SYNC instruction ensures that any loads and stores fetched prior 
to the present instruction are completed before any loads or stores after 
this instruction are allowed to start. Use of the SYNC instruction to 
serialize certain memory references may be required in 
multiprocessor environment for proper synchronization. 

For example: 



Processor A 


Processor B 


SW Rl, DATA 


1: LW 


R2, FLAG 


LI R2 , 1 


BEQ 


R2, R0, IB 


SYNC 


NOP 




SW R2, FLAG 


SYNC 






LW 


Rl, DATA 



The SYNC in processor A prevents DATA being written after FLAG, 
which could cause processor B to read stale data. The SYNC in 
processor B prevents DATA from being read before FLAG, which 
could likewise result in reading stale data. For processors which only 
execute loads and stores in order, with respect to shared memory, this 



: , «j — :,- -. KTrYP 



LL and SC instructions implicitly perform a SYNC. 
This instruction is allowed in User mode. 



Operation: 



32, 64 T: SyncOperationQ 



Exceptions: 

None 
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SYSCALL 


System Call 


SYSCALL 


31 26 25 




6 5 


SPECIAL 
000000 


Code 


SYSCALL 
1 1 00 


6 


20 


6 



Format: 



SYSCALL 



Description: 

A system call exception occurs, immediately and unconditionally 
transferring control to the exception handler. 
The code field is available for use as software parameters, but is 
retrieved by the exception handler only by loading the contents of the 
memory word containing the instruction. 

Operation: 




Exceptions: 

System Call exception 
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TEQ 



Trap If Equal 



TEQ 





31 26 25 


21 20 




SPECIAL 
000000 


rs 


rt 




6 


5 5 



16 15 



6 5 



code 



TEQ 
110100 



10 



6 



Format: 

TEQrs,rt 
Description: 

The contents of general register rt are compared to general register rs. 
If the contents of general register rs are equal to the contents of general 
register rt, a trap exception occurs. 

The code field is available for use as software parameters, but is 
retrieved by the exception handler only by loading the contents of the 
memory word containing the instruction. 

Operation: 



32, 64 T: 



if GPR[rs] = GPR[rt] then 

TrapException 
endif 



Exceptions: 

Trap exception 
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TEQI 



Trap If Equal Immediate 



TEQI 





31 26 25 




21 20 16 15 


C 


) 




REGIMM 
000001 


rs 


TEQI 
01100 


immediate 




6 


5 


5 


16 



Format: 

TEQI rs, immediate 
Description: 

The 16-bit immediate is sign-extended and compared to the contents of 
general register rs. If the contents of general register rs are equal to the 
sign-extended immediate, a trap exception occurs. 

Operation: 



32 T: HGPR[rs]= (immediate 15 ) 16 ||immediate 1 5.. then 
Trap Exception 
endif 

64 T: if GPR[rs] - (immediate! 5 ) 48 1| immediate 15 ..o then 
TrapException 
endif 



Exceptions: 

Trap exception 
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TGE 



Trap If Greater Than Or Equal 



TGE 



31 



26 25 



21 20 



16 15 



6 5 



SPECIAL 
000000 



rs 



code 



TGE 
1 1 0000 



10 



6 



Format: 

TGE rs, rt 
Description: 

The contents of general register rt are compared to the contents of 
general register rs. Considering both quantities as signed integers, if 
the contents of general register rs are greater than or equal to the 
contents of general register rt, a trap exception occurs. 
The code field is available for use as software parameters, but is 
retrieved by the exception handler only by loading the contents of the 
memory word containing the instruction. 

Operation: 



32, 64 T: if GPR[rs] > GPRfrt] then 
Trap Exception 
endif 



Exceptions: 

Trap exception 



A-164 



R4000 User's Manual-Preliminary 



TQ PI Trap If Greater Than Or Equal Immediate XG E I 





31 26 25 


21 20 16 15 


C 


) 




REGIMM 
000001 


rs 


TGEI 
01000 


immediate 




6 


5 5 


16 



Format: 

TGEI rs, immediate 
Description: 

The 16-bit immediate is sign-extended and compared to the contents of 
general register rs. Considering both quantities as signed integers, if 
the contents of general register rs are greater than or equal to the sign- 
extended immediate, a trap exception occurs. 

Operation: 



32 T: ifGPR[rs]> (immediate 15 ) 16 || immediate! 5 .. then 
TrapException 
endif 

64 T: ifGPR[rs]> (immediate 15 ) 48 || immediate 15 ..othen 
TrapException 
endii 



Exceptions: 

Trap exception 
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TGEIU 



Trap If Greater Than Or Equal 
Immediate Unsigned 



TGEIU 





31 26 25 


21 20 16 15 









REGIMM 
000001 


rs 


TGEIU 
01001 




immediate 








6 


5 5 


16 







Format: 

TGEIU rs, immediate 

Description: 

The 16-bit immediate is sign-extended and compared to the contents of 
general register rs. Considering both quantities as unsigned integers, 
if the contents of general register rs are greater than or equal to the 
sign-extended immediate, a trap exception occurs. 

Operation: 



32 T: if (0 || GPR[rs]) > (0 || (immediate 15 ) 16 || immediate 15 .. ) then 
TrapException 
endif 

64 T: if (0 || GPR[rs]) > (0 || (immediate! 5 ) 48 || immediate 15 .. ) then 
TrapException 
endif 



Exceptions: 

Trap exception 
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TG E U Trap ,f Greater Than 0r Equal Unsj 9 ned TG E U 



31 



26 25 



21 20 



16 15 



6 5 



SPECIAL 
0000 



rs 



code 



TGEU 
1 1 0001 



6 



10 



Format: 

TGEU rs, rt 
Description: 

The contents of general register rt are compared to the contents of 
general register rs. Considering both quantities as unsigned integers, 
if the contents of general register rs are greater than or equal to the 
contents of general register rt, a trap exception occurs. 
The code field is available for use as software parameters, but is 
retrieved by the exception handler only by loading the contents of the 
memory word containing the instruction. 

Operation: 



T: if (0 || GPR[rs]) > (0 || GPR[rt]) then 
TrapException 
endif 



Exceptions: 

Trap exception 
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TLBP 



Probe TLB For Matching Entry 



TLBP 



31 



26 25 24 



6 5 



COPO 
01 0000 



CO 

1 



000 000 0000 0000 0000 



TLBP 
001 000 



1 



19 



6 



Format: 



TLBP 



Description: 

The Index register is loaded with the address of the TLB entry whose 
contents match the contents of the EntryHi register. If no TLB entry 
matches, the high-order bit of the Index register is set 
The architecture does not specify the operation of memory references 
associated with the instruction immediately after a TLBP instruction, 
nor is the operation specified if more than one TLB entry matches. 

Operation: 



32 



i25 n «6 



64 



Index*- 1 || 0* 5 1| V 
for i in 0..TLBEntries-1 

if (TLB[i] 95 77= EntryHi 31 .. 12 ) and (TLBp] 7 6 or 

(TLB[i] 71 64 = EntryHi 7 ..o))then 
Index <-0 26 || i 5 .. 

endif 
endfor 

Index*- 1 || 31 
for i in 0..TLBEntries-1 

if (TLB[i] 167 14 i and not (0 15 J| TLBPI216..205)) 

= EntryHi 39 13 ) and not (0 15 || TLB[i] 216 ..205)) and 

fTLBm, , n or fTLBNli,a i™ = EntrvHi? n )) then 
Index <- 26 || i 5 .. 

endif 
endfor 



Exceptions: 

Coprocessor unusable exception 
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TLBR 



Read Indexed TLB Entry 



TLBR 



31 



26 25 24 



6 5 



COPO 
01 0000 



CO 

1 



000 0000 0000 0000 0000 



TLBR 
000001 



6 



19 



Format: 



TLBR 



Description: 

The G bit (controls ASID matching) read from the TLB is written into 
both EntryLoO and EntryLol. 

The EntryHi and EntryLo registers are loaded with the contents of the 
TLB entry pointed at by the contents of the TLB Index register. The 
operation is invalid (and the results are unspecified) if the contents of 
the TLB Index register are greater than the number of TLB entries in the 
processor. 
Operation: 



32 



64 T: 



PageMask <- TLB[lndex 5 „ ]i27..96 

EntryHi <- TLB[lndex 5 . ]95..64 and not TLB[lndex 5 „diz?.M 

EntryLol <-TLB[lndex 5 .. ]63..32 

EntryLoO <-TLB[lndex 5 .. ]3i..o 

PageMask <- TLB[lndex 5 .. ] 2 55..1 92 

EntryHi <- TLB[lndex 5 .. ]i9i..i284 and not TLB[lndex 5 „ ]255..i92 

EntryLol <-TLB[lndex 5 .. ]i27..65 II TLB[lndex 5 .. ]i40 

EntryLoO <- TLB[lndex 5 qI^^ || TLB[lndex 5 .. j 140 



Exceptions: 

Coprocessor unusable exception 
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TLBWI 



Write Indexed TLB Entry 



TLBWI 



31 



26 25 24 



6 5 



COPO 
01 0000 



CO 

1 



000 0000 0000 0000 0000 



TLBWI 
010 



6 



19 



6 



Format: 



TLBWI 



Description: 

The G bit of the TLB is written with the logical AND of the G bits in 
EntryLoO and EntryLol. 

The TLB entry pointed at by the contents of the TLB Index register is 
loaded with the contents of the EntryHi and EntryLo registers. 
The operation is invalid (and the results are unspecified) if the 
contents of the TLB Index register are greater than the number of TLB 
entries in the processor. 
Operation: 



32, 64 T: TLB[index 5 -0 ] <- 

PageMask || (EntryHi and not PageMask) || EntryLol || EntryLoO 



Exceptions: 

Coprocessor unusable exception 
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T L B W R Write Random TLB Entr y T L B W R 





31 26 25 24 


6 5 






COPO 
01 0000 


CO 

1 



000 0000 0000 0000 0000 


TLBWR 
0001 1 






6 1 19 


6 





Format: 

TLBWR 
Description: 

The G bit of the TLB is written with the logical AND of the G bits in 
EntiyLoO and EntryLol. 

The TLB entry pointed at by the contents of the TLB Random register 
is loaded with the contents of the EntryHi and EntryLo registers. 

Operation: 



32, 64 T: TLB[Random 5 ] <- 

PageMask || (EntryHi and not PageMask) || EntryLol || EntryLoO 



Exceptions: 

Coprocessor unusable exception 



R4000 User's Manual-Preliminary 



A-171 



Appendix A 



TLT 



Trap If Less Than 



TLT 



31 



26 25 



21 20 



16 15 



6 5 



SPECIAL 
000000 



rs 



code 



TLT 
110010 



6 



10 



6 



Format: 

TLTis,rt 
Description: 

The contents of general register rt are compared to general register rs. 
Considering both quantities as signed integers, if the contents of 
general register rs are less than the contents of general register rt, a 
trap exception occurs. 

The code field is available for use as software parameters, but is 
retrieved by the exception handler only by loading the contents of the 
memory word containing the instruction. 

Operation: 



32,64 



if GPR[rsj < GPR[rt] then 

TrapException 
endif 



Exceptions: 

Trap exception 



A-172 



R4000 User's Manual-Preliminary 



TLTI 



Trap If Less Than Immediate 



TLTI 





31 26 25 


21 20 16 15 


c 


) 




REGIMM 
000001 


rs 


TLTI 
01010 


immediate 




6 


5 5 


16 



Format: 

TLTI rs, immediate 

Description: 

The 16-bit immediate is sign-extended and compared to the contents of 
general register rs. Considering both quantities as signed integers, if 
the contents of general register rs are less than the sign-extended 
immediate, a trap exception occurs. 

Operation: 



32 T: if GPR[rs]< (immediate 15 ) 16 ||immediate 15 ..othen 
TrapException 
endif 
64 T: ifGPR[rs]< (immediate 15 ) 48 1| immediatei 5 .. then 
TrapException 
endif 



Exceptions: 

Trap exception 
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TLTI U Trap lf Less Than ,mmedlate Unsi 9 ned T LTI U 



31 



26 25 



21 20 



16 15 



REGIMM 
000001 



rs 



TLTIU 
01011 



immediate 



6 



16 



Format: 

TLTIU rs, immediate 

Description: 



The 16-bit immediate is sign-extended and compared to the contents of 
general register rs. Considering both quantities as signed integers, if 
the contents of general register rs are less than the sign-extended 
immediate, a trap exception occurs. 



Operation: 



32 T: if (0 1| GPR[rs]) < (0 1| (immediate 15 ) 16 1| immediate 15 .. ) then 
TrapException 
endif 
32 T: if (0 1| GPRfrs]) < (0 1| (immediate 15 ) 48 1| immediate 15 .. ) then 
TrapException 
endif 



Exceptions: 

Trap exception 
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Trap If Less Than Unsigned 



TLTU 





31 26 25 21 20 


16 15 




6 5 






SPECIAL 
000000 


rs 


rt 


code 


TLTU 
1 1001 1 






6 5 5 




10 


6 






Format: 


TLTU rs, rt 

















Description: 

The contents of general register rt are compared to general register rs. 
Considering both quantities as unsigned integers, if the contents of 
general register rs are less than the contents of general register rt, a 
trap exception occurs. 

The code field is available for use as software parameters, but is 
retrieved by the exception handler only by loading the contents of the 
memory word containing the instruction. 

Operation: 



32,64 



T: if (0 || GPRfrs]) < (0 || GPR[rt]) then 
TrapException 
end'rf 



Exceptions: 

Trap exception 
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Y N E Trap If Not Equal 


TNE 


31 26 25 21 20 16 15 


6 5 


SPECIAL 
000000 


rs 


rt 


code 


TNE 
110110 


6 5 5 10 


6 



Format: 



TNErs,rt 



Description: 

The contents of general register rt are compared to general register rs. 
If the contents of general register rs are not equal to the contents of 
general register rt, a trap exception occurs. 
The code field is available for use as software parameters, but is 
retrieved by the exception handler only by loading the contents of the 
memory word containing the instruction. 

Operation: 



32, 64 T: if GPR[rs] # GPR[rt] then 
TrapException 
endif 



Exceptions: 

Trap exception 
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Trap If Not Equal Immediate 



TNEI 





31 26 25 


21 20 16 15 


C 


) 




REGIMM 
00000 1 


rs 


TNEI 
01110 


immediate 






6 


5 5 


16 





Format: 

TNEI rs, immediate 

Description: 

The 16-bit immediate is sign-extended and compared to the contents of 
general register rs. If the contents of general register rs are not equal to 
the sign-extended immediate, a trap exception occurs. 

Operation: 



32 T: if GPR[rs] * (immediate, 5 ) 16 1| immediate 15 .. then 
TrapException 
endif 

64 T: if GPR[rs] * (immediate^) 48 || immediate 15 .. then 
TrapException 
endif 



Exceptions: 

Trap exception 
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XOR Exclusive Or 


XOR 


31 26 25 21 20 16 15 11 10 6 


5 


SPECIAL 
000000 


rs 


rt 


rd 



00000 


XOR 
100110 


6 5 5 5 5 


6 



Format: 

XOR rd, rs, rt 
Description: 

The contents of general register rs are combined with the contents of 
general register rt in a bit-wise logical exclusive OR operation. The 
result is placed into general register rd. 

Operation: 



32, 64 T: GPR[rd] *- GPR[rs] xor GPR(rt] 



Exceptions: 

None 
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Exclusive OR Immediate 



XORI 





31 26 25 


21 20 


16 15 











XORI 
001110 


rs 


rt 


immediate 






6 


5 5 




16 







Format: 

XORI rt, rs, immediate 

Description: 

The 16-bit immediate is zero-extended and combined with the contents 
of general register rs in a bit-wise logical exclusive OR operation. The 
result is placed into general register rt. 

Operation: 



32 T: GPR[rt] <- GPR[rs] xor(0 16 || immediate) 
64 T: GPR[rt] *-GPR[rs] xor(0 48 1| immediate) 



Exceptions: 

None 
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CPU Instruction Opcode Bit Encoding 



The remainder of this Appendix presents the opcode bit encoding for 
the CPU instruction set (ISA and extensions), as implemented by the 
R4000. Figure A-2 lists the R4000 Opcode Bit Encoding. 



31 29 


28..26 



1 


2 


Opcode 

3 4 


5 


6 


7 





SPECIAL 


REGIMM 


J 


JAL 


BEQ 


BNE 


BLEZ 


BGTZ 


1 


ADDI 


ADDIU 


SLTI 


SLTIU 


ANDI 


ORI 


XORI 


LUI 


2 


COPO 


COP1 


COP2 


* 


BEQL 


BNEL 


BLE2L 


BGTZL 


3 


DADDIe 


DADDlUe 


LDLe 


LDRe 


* 


* 






4 


LB 


LH 


LWL 


LW 


LBU 


LHU 


LWR 


L LWUe 


5 


SB 


SH 


SWL 


SW 


SDLe 


SDRe 


SWR 


CACHES 


6 


LL 


LWC1 


LWC2 


* 


LLDe 


LDC1 


LDC2 


LDC3 


7 


SC 


SWC1 


SWC2 


* 


SCDe 


SDC1 


SDC2 


SDC3 



5..3 



1 
2 
3 
4 
5 
6 



2..0 



1 


SPECIAL function 

2 3 4 


5 


6 


7 


SLL 


* 


SRL 


SRA 


SLLV 


* 


SRLV 


SRAV 


JR 


JALR 


* 


* 


SYSCALL 


BREAK 


• 


SYNC 


MFHI 


MTHI 


MFLO 


MTLO 


DSLLVe 


* 


DSRLVe 


DSRAVe 


MULT 


MULTU 


DIV 


DIVU 


DMULTe 


DMULTUe 


DDIVe 


DDIVUe 


ADD 


ADDU 


SUB 


SUBU 


AND 


OR 


XOR 


NOR 


* 


* 


SLT 


SLTU 


DADDe 


DADDUe 


DSUBe 


DSUBUe 


TGE 


TGEU 


TLT 


TLTU 


TEQ 


* 


TNE 


* 


DSLLe 


* 


DSRLe 


DSRAe 


DSLL32e 


« 


DSRL32e 


DSRA32e 



19 


18..16 



1 


2 


REGIMM it 

3 4 


5 


6 


7 





BLTZ 


BGEZ 


BLTZL 


BGEZL 


* 


« 






1 


TGEI 


TGEIU 


TLTI 


TLTIU 


TEQI 


# 


TNEI 




2 


BLTZAL^ 


BGEZAL 


BLTZALL 


BGEZALL 


* 


* 






3 


* 


* 


* 


* 


* 


* 







24 


23..21 




1 


2 


COPz rs 

3 4 


5 


6 


7 


n 


MF 


DMFe 


CF 


7 MT 


DMTe 


CT 


y 


1 


BC 


7 


y 


7 


7 


y 


7 


2 

3 




CO 



Figure A-2 R4000 Opcode Bit Encoding 
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20..19 

1 
2 
3 

5.. 3 


1 

2 

3 



1 

2 

3 


18..16 



1 


2 


COPzrt 

3 4 


5 


6 


7 


BCF 


BCT 


BCFL 


BCTL 7 


7 


7 


7 


7 


7 


7 


7 7 


7 


7 


7 


7 


7 


7 


7 7 


7 


7 


7 


7 


7 


7 


7 7 


7 


7 


7 


2.. 



1 


< 
2 


3P0 Function 

3 4 


5 


6 


7 


<t> 


TLBR 


TLBWI 


<|> + 


<t> 


TLBWR 


<t> 


TLBP 


«> 


6 


♦ ♦ 


4> 


♦ 


♦ 


I 


♦ 


♦ 


♦ ♦ 


♦ 


♦ 


♦ 


ERETX 


♦ 


♦ 


<t> ♦ 


* 


♦ 


♦ 


<t> 


♦ 


♦ 


<t> <t> 


♦ 


♦ 


♦ 


<t> 


4> 


♦ 


4> <t> 


♦ 


♦ 


<t> 


<t> 


♦ 


♦ 


4> «> 


<t> 


♦ 


♦ 


4> 


4> 


♦ 


<t> ♦ 


<t> 


♦ 


♦ 


_J 



Figure A-2 R4000 Opcode Bit Encoding (cont.) 



Key: 



X 
e 



Operation codes marked with an asterisk cause reserved 
instruction exceptions in all current implementations and are 
reserved for future versions of the architecture. 
Operation codes marked with a gamma cause a reserved 
instruction exception. They are reserved for future versions 
of the architecture. 

Operation codes marked with a delta are valid only for 
R4000 processors with CPO enabled, and cause a reserved 
instruction exception on other processors. 

Operation codes marked with a phi are invalid but do not 

cause reserved instruction exceptions in R4000 

implementations. 

Operation codes marked with a xi cause a reserved 

instruction exception on R4000 processors. 

Operation codes marked with a chi are valid only on R4000. 

Operation codes marked with epsilon are valid when the 
processor operating as a 64-bit processor. These instructions 
will cause a reserved instruction exception if 64-bit 
operation is not enabled. 
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B 



This appendix provides a detailed description of the operation of each 
Floating-Point (FPU) instruction. The instructions are listed alphabet- 
ically. The exceptions that may occur due to the execution of each in- 
struction are listed after the description of each instruction. The de- 
scription of the immediate causes and the manner of handling excep- 
tions is omitted from the instruction descriptions in this chapter. Refer 
to Chapter 8 for detailed descriptions of floating-point exceptions and 
handling. 

Figure B-5 lists the entire bit encoding for the constant fields of the 
Floating-Point instruction set; the bit encoding for each instruction is 
included with that individual instruction. 

Instruction Formats 

There are three basic instruction format types: 

• I-Type, or Immediate instructions, which include load and 
store operations, 

• M-Type, or Move instructions, and 

• R-Type, or Register instructions, which include the two- 
and three-register Floating-Point operations. 

The instruction description subsections that follow show how the 
three basic instruction formats are used by. 

• Load and store instructions, 

• Move instructions, and 

• Floating-Point Computational instructions. 

A fourth instruction description subsection describes the special in- 
struction format used by floating-point branch instructions. 
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Floating-point instructions are mapped onto the MIPS coprocessor in- 
structions, defining coprocessor unit number one (CP1) as the float- 
ing-point unit. 

Each operation is valid only for certain formats. Implementations may 
support some of these formats and operations only through emula- 
tion, but only need support combinations that are valid, which are 
marked with a V in Table B-l below. Those combinations marked 
with a "R" are not currently specified by this architecture, causing an 
unimplemented instruction trap, to maintain compatibility with fu- 
ture architecture extensions. 
Table B-l Valid FPU Instruction Formats 



Operation 


Source Format 


Single 


Double 


Word 


Longword 


ADD 


V 


V 


R 


R 


SUB 


V 


V 


R 


R 


MUL 


V 


V 


R 


R 


DTV 


V 


V 


R 


R 


SQRT 


V 


V 


R 


R 


ABS 


V 


V 


R 


R 


MOV 


V 


V 






NEG 


V 


V 


R 


R 


TRUNC.L 


V 


V 






ROUND.L 


V 


V 






CEIL.L 


V 


V 






FLOOR.L 


V 


V 






TRUNC.W 


V 


V 






ROUND.W 


V 


V 






CEIL.W 


V 


V 






FLOOR.W 


V 


V 






CVT.S 




V 


V 


V 


CVT.D 


V 




V 


V 


CVT.W 


V 


V 






CVT.L 


V 


V 






C 


V 


V 


R 


R 
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The coprocessor branch on condition true/ false instructions can be 
used to logically negate any predicate. Thus, the 32 possible condi- 
tions require only 16 distinct comparisons, as shown in 
Table B-2 below. 
Table B-2 Logical Negation of Predicates by Condition True/False 



Condition 


Relations 


Invalid 
operation 
exception if 
unordered 


Mnemonic 


Code 


Greater 
Than 


Less 
Than 


Equal 


Unordered 


True 


False 


F 

UN 

EQ 

UEQ 

OLT 

ULT 

OLE 

ULE 


T 

OR 

NEQ 

OGL 

UGE 

OGE 

UGT 

OGT 




1 
2 
3 
4 
5 
6 
7 


F 
F 

F 
F 
F 
F 
F 
F 


F 
F 

F 
F 

T 
T 
T 
T 


F 
F 
T 
T 
F 
F 
T 
T 


F 
T 
F 
T 
F 
T 
F 
T 


zzzzzzzz 

oooooooo 


SF 

NGLE 

SEQ 

NGL 

LT 

NGE 

LE 

NGT 


ST 

GLE 

SNE 

GL 

NLT 

GE 

NLE 

GT 


8 
9 
10 
11 
12 
13 
14 
15 


F 
F 
F 

F 
F 
F 
F 
F 


F 
F 
F 

F 

T 
T 
T 
T 


F 
F 
T 
T 
F 
F 
T 
T 


F 
T 
F 
T 
F 
T 
F 
T 


Yes 
Yes 
Yes 
Yes 
Yes 
Yes 
Yes 
Yes 
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Floating-Point Loads, Stores, and Moves 



All movement of data between the floating-point coprocessor and 
memory is accomplished by coprocessor load and store operations, 
which reference the floating-point coprocessor's General-Purpose Reg- 
isters. These operations are unformated; no format conversions are 
performed and, therefore, no floating-point exceptions occur due to 
these operations. 

Data may also be directly moved between the floating-point coproces- 
sor and the processor by move to coprocessor and move from copro- 
cessor instructions. Like the floating-point load and store operations, 
move to/from operations perform no format conversions and never 
cause floating-point exceptions. 

An additional pair of coprocessor registers are available, called Float- 
ing-Point Control registers for which the only data movement opera- 
tions supported are moves to and from processor General-Purpose Reg- 
isters. 



Floating-Point Operations 



The floating-point unit's operation set includes floating-point add, 
subtract, multiply, divide, square root, convert between fixed-point 
and floating-point format, convert between floating-point formats, 
and floating-point compare. These operations satisfy IEEE Standard 
754's requirements for accuracy. Specifically, these operations obtain 
a result which is identical to performing the result with infinite preci- 
sion and then rounding to the specified format, using the current 
rounding mode. 

Instructions must specify the format of their operands. Except for con- 
version functions, mixed-format operations are not provided. 
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Instruction Notational Conventions 

In this appendix, all variable subfields in an instruction format (such 
as fs, ft, immediate, and so on) are shown with lower-case names. The 
instruction name (such as ADD, SUB, and so on) is shown in upper- 
case. 

For the sake of clarity, an alias is sometimes substituted for a variable 
subfield in the formats of specific instructions. For example, we use 
rs = base in the format for load and store instructions. Such an alias is 
always lower case, since it refers to a variable subfield. 
In some instructions, however, the two instruction subfields op and 
function have constant 6-bit values. When reference is made to these 
instructions, upper-case mnemonics are used. In the floating-point in- 
struction, for example, we use op = COP1 and function = FADD. In 
some cases, a single field has both fixed and variable subfields, so the 
name contains both upper and lower case characters. Actual bit en- 
coding for mnemonics is shown in Figure B-5 at the end of this appen- 
dix, and are also included with each individual instruction. 
In the instruction description examples that follow, the Operation sec- 
tion describes the operation performed by each instruction using a 
high-level language notation. 

Instruction Notation Examples 

Example #1 : 



GPR[ft] «- immediate || 16 



Sixteen zero bits are concatenated with an immediate value 
(typically 16 bits), and the 32-bit string (with the lower 16 bits 
set to zero) is assigned to GPR register ft. 

Example #2: 



(immediate! 5 ) 16 || immediate 15 .. 



Bit 15 (the sign bit) of an immediate value is extended for 16 bit 
positions, and the result is concatenated with bits 1 5 through of 
the immediate value to form a 32-bit sign-extended value. 
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Load and Store Instructions 



In the MIPS ISA, all load operations have a delay of at least one in- 
struction. That is, the instruction immediately following a load cannot 
use the contents of the register that will be loaded with the data being 
fetched from storage. 

In the R4000, the instruction immediately following a load may use 
the contents of the register loaded. In such cases, the hardware will in- 
terlock, requiring additional real cycles, so scheduling load delay slots 
is still desirable, although not absolutely required for functional code. 
When the FR bit in the Status register equals zero, the Floating-Point 
General Registers (FGR) are 32-bits wide. When the FR bit in the Status 
register equals one, the Floati?ig-Point General Registers (FGR) are 64- 
bits wide. The behavior of the load store insturctions in dependent on 
the width of the FGRs. 

In the load/store operation descriptions, the functions listed in 
Table B-3 are used to summarize the handling of virtual addresses and 
physical memory. 
Table B-3 Load/Store Common Functions 



Function 


Meaning 


AddressTranslation 
LoadMemory 

Qt/~iroMamnrw 


Uses the TLB to find the physical address given the virtual 
address. The function fails and an exception is taken if the 
required translation is not present in the TLB. 

Uses the cache and main memory to find the contents of the 
word containing the specified physical address. The low-order 
two bits of the address and the access type field indicates which 
of each of the four bytes within the data word need to be 
returned. If the cache is enabled for this access, the entire word 
is returned and loaded into the cache. 

I icoc thp r.anhp. write hi iffer and main memorv to store the word 
or part of word specified as data in the word containing the 
specified physical address. The low-order two bits of the 
address and the access type field indicates which of each of the 
four bytes within the data word should be stored. 
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Figure B-3 shows the I-Type instruction format used by load and store 
operations. 



I-Type (Immediate) 
31 26 25 



21 20 



16 15 



op 


base 


ft 


offset 



where: 



16 



op is a 6-bit operation code 

base is the 5-bit base register specifier 

ft is a 5-bit source (for stores) or destination (for loads) 

FPA register specifier 
offset is the 1 6-bit signed immediate offset 



Figure B-3 Load and Store Instruction Format 

All coprocessor loads and stores reference aligned word data items. 
Thus, for word loads and stores, the access type field is always 
WORD, and the low-order two bits of the address must always be 
zero. 

For doubleword loads and stores, the access type field is always DOU- 
BLEWORD, and the low-order three bits of the address must always 
be zero. 

Regardless of byte-numbering order (endianness), the address speci- 
fies that byte which has the smallest byte-address of all of the bytes in 
the addressed field. For a Big-endian machine, this is the leftmost byte; 
for a Little-endian machine, this is the rightmost byte. 
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Computational Instructions 



Computational instructions include all of the arithmetic floating-point 
operations performed by the FPU. 

Figure B-4 shows the R-Type instruction format used for computa- 
tional operations. 



R-Type (Register) 
31 26 25 



21 20 



16 15 



11 10 



6 5 



COP1 



fmt 



fs 



fa- 



function 



where: 



COP1 is a 6-bit major operation code 

fmt is a 5-bit format specifier 

fs is a 5-bit sourcel register 

ft is a 5-bit source2 register 

fd is a 5-bit destination register 

function is a 6-bit function field 



Figure B-4 Computational Instruction Formal 

Each floating-point instruction can be applied to a number of operand 
formats. The operand format for an instruction is specified by the 4-bit 
Format field; decoding for this field is shown in Table B-l. 
Table B-l Format Field Decoding 



Code 


Mnemonic 


Size 


Format 


16 


S 


single 


Binary floating-point 


17 


D 


double 


Binary floating-point 


18 


Reserved 






19 


Reserved 






20 


W 


single 


Binary fixed-point 


21 


L 


longword 


64-bit binary fixed-point 


22-31 


- 


- 


Reserved 



The function indicates which floating-point operation is to be per- 
formed. Table B-2 lists all floating-point instructions. 
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Table B-2 Floating-Point Instructions and Operations 



Code (5.. 0) 







10 



11 



12 



13 



14 



15 



16-31 



32 



33 



34 



35 



36 



37 



38-47 



48-63 



Mnemonic 



ADD 



SUB 



MUL 



DIV 



SQRT 



ABS 



MOV 



NEG 



ROUND.L 



TRUNC.L 



CEIL.L 



FLOOR.L 



ROUND.W 



TRUNC.W 



Operation 



Add 



Subtract 



Multiply 



Divide 



Square root 



Absolute value 



Move 



Negate 



Convert to single fixed-point, rounded to nearest/even 



Convert to single fixed-point, rounded toward zero 



Convert to single fixed-point, rounded to 



+O0 



Convert to single fixed-point, rounded to -« 



Convert to single fixed-point, rounded to nearest/even 



CEIL.W 



FLOOR.W 



CVT.S 



CVT.D 



CVT.W 



CVT.L 



Convert to single fixed-point, rounded toward zero 



Convert to single fixed-point, rounded to + <» 



Convert to single f ixed-point, rounded to - « 



Reserved 



Convert to single floating-point 



Convert to double floating-point 



Reserved 



Reserved 



Convert to binary fixed-point 



Convert to 64-bit binary fixed-point 



Reserved 



Floating-point compare 
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In the following pages, the notation FGR refers to the FPU's 32 Gener- 
al-Purpose Registers FGRO through FGR31, and FPR refers to the 
FPU's Floating-Point Registers. When the FR bit in the Status register 
(SR 2 6> equals zero, only the even Floating-Point Registers are valid 
and the FPU's 32 General-Purpose Registers are 32-bits wide. When 
the FR bit in the Status register (SR 2 6> equals one, both odd and even 
Floating-Point Registers may be used and the FPU's 32 General-Pur- 
pose Registers are 64-bits wide. 

The following routines are used in the description of the floating-point 
operations to get the value of an FPR or to change the value of an FGR: 



32 Bit Mode 

value <- ValueFPR(fpr, fmt) 
/* undefined for odd fpr */ 
case fmt of 
S.W: 

value <- FGR[fpr+0] 
D: 

/* undefined for fpr not even */ 
value <- FGR[fpr+1] || FGR[fpr+0] 
end 

StoreFPR(fpr, fmt, value): 
/* undefined for odd fpr */ 
case fmt of 
S, W: 

FGR[fpr+1] <- undefined 
FGR[fpr+0] <- value 
D: 

FGR[fpr+1] <- value 63 3 2 



Cf5nf-fnr4.nl <r~ value.,., „ 
• w>. •i-r i • "j - •"•*■- oi..u 



end 
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64 Bit Mode 

value <- ValueFPR(fpr, fmt) 
case fmt of 
S: 

value <- FGR[fpr] 31 
■ D, L: 

value <- FGR[fpr] 
W: 

value <- FGR[fpr] 
end 

StoreFPR(fpr, fmt, value): 
case fmt of 
S, W: 

FGR[fpr] <- undefined 32 || value 
D, L: 

FGR[fpr] <- value 
end 
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ABS.flTit Floating-Point Absolute Value ABS.flTlt 


31 26 25 21 20 16 15 11 10 6 5 


COP1 
10001 • 


fmt 



00000 


fs 


fd 


ABS 
000101 


6 5 5 5 5 6 



Format: 

ABS.fmt fd, fs 
Description: 

The contents of the FPU register specified by fs are interpreted in the 
specified format and the arithmetic absolute value is taken. The result 
is placed in the floating-point register specified by fd. 

The absolute value operation is arithmetic; a NaN operand signals in- 
valid operation. 

This instruction is valid only for single- and double-precision floating- 
point formats. The operation is not defined if bit of any register spec- 
ification is set and the FR bit in the Status register equals zero, since the 
register numbers specify an even-odd pair of adjacent coprocessor 
general registers. When the FR bit in the Status register equals one, 
both even and odd register numbers are valid. 

Operation: 



T: StoreFPR(fd, fmt, Absolute Value(ValueFPR(fs, fmt))) 



Exceptions: 

Coprocessor unusable exception 
Coprocessor exception trap 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
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ADD.fmt 



Floating-Point Add 



ADD.fmt 





31 26 25 21 20 16 15 


11 10 


6 5 






COP1 
10001 


fmt 


ft 


fs 


fd 


ADD 
000000 






6 5 5 5 


5 


6 






Format: 


ADDimt fd, fs, ft 















Description: 

The contents of the FPU registers specified by fs and ft are interpreted 
in the specified format and arithmetically added. The result is round- 
ed as if calculated to infinite precision and then rounded to the speci- 
fied format (fmt), according to the current rounding mode. The result 
is placed in the floating-point register (FPR) specified by fd. 
This instruction is valid only for single- and double-precision float- 
ing-point formats. The operation is not defined if bit of any register 
specification is set and the FR bit in the Status register equals zero, 
since the register numbers specify an even-odd pair of adjacent copro- 
cessor general registers. When the FR bit in the Status register equals 
one, both even and odd register numbers are valid. 

Operation: 



T: StoreFPR (fd, fmt, ValueFPR(fs, fmt) + ValueFPR(ft, fmt)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
Inexact exception 
Overflow exception 
Underflow exception 
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BC1F 



Branch On FPA False 
(coprocessor 1) 



BC1F 



31 



26 25 



21 20 16 15 



COP1 
010001 



BC 

01000 



BCF 
00000 



Offset 



6 



16 



Format: 

BC1F offset 
Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two 
bits and sign-extended. If the result of the last floating-point compare 
is false, the program branches to the target address, with a delay of 
one instruction. 
Operation: 



32 


T-1: conditions- nofCOC[1] 




T: target «- (offset, 5 ) 1 4 1 1 offset 1 1 2 




T+1: if .condition then 




PC <- PC + target 




endif 


64 


T-1: condition <- nofCOC[1] 




T: target «- (offset, 5 ) 38 1 1 offset 1 1 2 




T+1: if condition then 




PC <- PC + target 




endif 



Exceptions: 

Coprocessor unusable exception 
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HA^l Branch On FPU False Likely D f* A C I 

DV/irL (coprocessor 1) BV/ irL 



31 26 25 21 20 16 15 



COP1 
01000 1 



BC 
01000 



BCFL 
00010 



offset 



16 



Format: 

BC1FL offset 

Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two 
bits and sign-extended. 

If the result of the last floating-point compare is false, the program 
branches to the target address, with a delay of one instruction. If the 
conditional branch is not taken, the instruction in the branch delay slot 
is nullified. 
Operation: 



32 T-1: condition*- nofCOC[l] 

T: target f-(offset 15 ) 14 || offset || 2 
T+1: if condition then 

PC <- PC + target 
else 

NuIlifyCurrentlnstruction 

endif 
64 T-1: conditions- not COC[1] 

T: target «- (offset 15 ) 38 || offset || 2 
T+1: if condition then 

PC <- PC + target 

else 

NuIlifyCurrentlnstruction 

endif 



Exceptions: 

Coprocessor unusable exception 
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BC1T 



Branch On FPU True 
(coprocessor!) 



BC1T 



31 



26 25 



21 20 



1615 



COP1 
10 1 



BC 
01000 



BCT 
0000 1 



offset 



16 



Format: 



BC1T offset 



Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two 
bits and sign-extended. If the result of the last floating-point compare 
is true, the program branches to the target address, with a delay of one 
instruction. 

Operation: 



32 



64 



T-1: 


condition <- COC[1] 


T: 


target <- (offset! 5 ) 14 || offset || 2 


T+1: 


if condition then 




PC <r- PC + target 




endif 


T-1: 


condition <-COC[1] 


T: 


target «- (offset 15 ) 38 || offset || 2 


T+1: 


if condition then 




PC «- PC + target 




endif 



Exceptions: 

Coprocessor unusable exception 
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BC1TL 



Branch On FPU True Likely 
(coprocessor 1) 



BC1TL 



31 



26 25 



21 20 



1615 



COP1 
01 0001 



BC 
01000 



BCTL 
00011 



offset 



6 



16 



Format: 

BC1TL offset 
Description: 

A branch target address is computed from the sum of the address of 
the instruction in the delay slot and the 16-bit offset, shifted left two 
bits and sign-extended. 

If the result of the last floating-point compare is true, the program 
branches to the target address, with a delay of one instruction. If the 
conditional branch is not taken, the instruction in the branch delay slot 
is nullified. 
Operation: 



32 T-1: condition <-COC[1] 

T: target «- (offset! 5 ) 1 4 1 1 offset 1 1 2 
T+1: if condition then 

PC <- PC + target 
else 

NuliifyCurrentlnstruction 
endif 
64 T-1: condition 4- COC[1] 

T: target <- (offset! s) 38 1 1 offset 1 1 2 
T+1: if condition then 

PC «- PC + target 
else 

NuliifyCurrentlnstruction 

endif 



Exceptions: 

Coprocessor unusable exception 
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C.cond.fmt 



Floating-Point 
Compare 



C.cond.fmt 



31 



26 25 



21 20 



16 15 



11 10 



6 5 43 



COP1 
010001 



fmt 



fs 




00000 



FC* 



cond* 



6 



Format: 

C.cond.fmt fs, ft 
Description: 

The contents of the floating-point registers specified by fs and ft are in- 
terpreted in the specified format and arithmetically compared. 
A result is determined based on the comparison and the conditions 
specified in the instruction. If one of the values is a Not a Number 
(NaN), and the high-order bit of the condition field is set, an invalid op- 
eration exception is taken. After a one-instruction delay, the condition 
is available for testing with branch on floating-point coprocessor con- 
dition instructions. 

Comparisons are exact and can neither overflow nor underflow. Four 
mutually exclusive relations are possible results: less than, equal, 
greater than, and unordered. The last case arises when one or both of 
the operands are NaN; every NaN compares unordered with every- 
thing, including itself. Comparisons ignore the sign of zero, so 
+0 = -0. 

This instruction is valid only for single- and double-precision float- 
ing-point formats. The operation is not defined if bit of any register 
specification is set and the FR bit in the Status register equals zero, 
since the register numbers specify an even-odd pair of adjacent copro- 
cessor general registers. When the FR bit in the Status register equals 
one, both even and odd register numbers are valid. 



"See "FPU Instruction Opcode Bit Encoding" at the end of 
Appendix B. 
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C.cond.fmt T^T C.cond.fmt 

(continued) 



Operation: 



T: if NaN(ValueFPR(fs, fmt)) or NaN(ValueFPR(ft, fmt)) then 

less «- false 

equal «- false 

unordered «- true 

if cond 3 then 
signal InvalidOperationException 

endif 
else 

less «- ValueFPR(fs, fmt) < ValueFPR(ft, fmt) 

equal «- ValueFPR(fs, fmt) = ValueFPR(ft, fmt) 

unordered «- false 
endif 
condition «- (cond 2 and less) or (condi antf equal) or 

(cond and unordered) 
FCR[31] 2 3 <- condition 
C0C[1] «- condition 



Exceptions: 

Coprocessor unusable 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
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CEIL.L.fmt 



Floating-Point 
Ceiling to Long 
Fixed-Point Format 



CEIL.L.fmt 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



COP1 
01 0001 



imt 




00000 



fs 



Id 



CEILL 
001010 



Format: 

CEIL.L.fmt fd, fs 
Description: 

The contents of the floating-point register specified by fs are interpret- 
ed in the specified source format, fmt, and arithmetically converted to 
the single fixed-point format. The result is placed in the floating-point 
register specified by fd. 

Regardless of the setting of the current rounding mode, the conver- 
sion is rounded as if the current rounding mode is round to +<» (2). 
This instruction is valid only for conversion from single-, double-, ex- 
tended or quad-precision floating-point formats. If extended or quad- 
precision format is specified, the operation is not defined if bit of the 
source register specification is set, since the register number specifies 
an aligned coprocessor general register. When the FR bit in the Status 
register equals one, both even and odd register numbers are valid. 

When the source operand is an Infinity, NaN, or the correctly round- 
ed integer result is outside of -I 63 to 2 63 - 1, the Invalid operation ex- 
ception is raised. If the Invalid operation is not enabled then no excep- 
tion is taken and 2 s3 - 1 is returned. 

This instruction is not implemented on MIPS I or MIPS II processors, 
anH will ran<;p an unimnlemented cmeration exception to occur. 
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CEILLfmt SSSS CEIL.L.fmt 

Fixed-Point Format 
(continued) 



Operation: 



T: StoreFPR(fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L)) 



Exceptions: 

Coprocessor unusable exception 

Coprocessor Interrupt (R2000, R3000,or R6000) 

Floating-Point exception (R4000) 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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CEIL.W.fmt Sgffi'e CEIL.W.fmt 



ling to Single 
Fixed-Point Format 



31 26 25 21 20 16 15 11 10 6 5 




COP1 
01 0001 


fmt 



00000 


fs 


fd 


CEILW 
001110 


6 5 5 5 5 6 




Format: 


CEIL.W.fmt ftf, fs 











Description: 

The contents of the floating-point register specified by fs are interpret- 
ed in the specified source format, fmt, and arithmetically converted to 
the single fixed-point format. The result is placed in the floating-point 
register specified by fd. 

Regardless of the setting of the current rounding mode, the conver- 
sion is rounded as if the current rounding mode is round to +« (2). 
This instruction is valid only for conversion from a single- or double- 
precision floating-point formats. The operation is not defined if bit 
of any register specification is set and the FR bit in the Status register 
equals zero, since the register numbers specify an even-odd pair of ad- 
jacent coprocessor general registers. When the FR bit in the Status reg- 
ister equals one, both even and odd register numbers are valid. 
When the source operand is an Infinity or NaN, or the correctly round- 
ed integer result is outside of -2 31 to 2 31 - 1, the Invalid operation ex- 
ception is raised. If the Invalid operation is not enabled then no excep- 
tion is taken and 2 31 - 1 is returned. 
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CEILW.fmt SXSe CEIL.W.fmt 

Fixed-Point Format 
(continued) 



Operation: 



StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, fmt), fmt, W)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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CFC1 



Move Control Word From FPU 
(coprocessor 1) 



CFC1 



31 



26 25 



21 20 



16 15 



11 10 



COP1 
010001 



CF 

00010 



fs 



000 0000 0000 



11 



Format: 

CFCl rt,fs 
Description: 

The contents of the FPU's control register fs are loaded into general 
register rt . 

This operation is only defined when fs equals or 31. 
The contents of general register rt are undefined for time T of the in- 
struction immediately following this load instruction. 

Operation: 



32 



64 



T: temp «- FCRtfs] 

T+ 1 : GPR[rt] <- temp 

T: temp «- FCR[fe] 

T+1: GPRfrt] <- (temp 31 )°^ || temp 



32 



Exceptions: 

Coprocessor unusable exception 
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CTC1 



Move Control Word To FPU 
(coprocessor 1) 



CTC1 



31 



26 25 



21 20 



16 15 



11 10 



COP1 
1000 1 



CT 

00110 



fs 



000 0000 0000 



11 



Format: 

CTCl rt,fs 
Description: 

The contents of general register rt are loaded into the FPU's control 
register fs. This operation is only defined when fs equals or 31. 
Writing to Control Register 31, the floating-point Control/Status register, 
causes an interrupt or exception if any cause bit and its corresponding 
enable bit are both set. The register will be written before the exception 
occurs. The contents of floating-point control register fs are undefined 
for time T of the instruction immediately following this load instruc- 
tion. 
Operation: 



32 



64 



T: temp<- GPR[rt] 
T+1: FCR[fs] <- temp 

COC[1] «- FCR[31] 23 

T: temp<- GPR[rt] 31 .. 

T+1: FCR[fs] «- temp 

COC[1] «- FCR[31] 23 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
Division by zero exception 
Inexact exception 
Overflow exception 
Underflow exception 
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CVT.D.fmt 



Floating-Point 

Convert to Double 

Floating-Point Format 



CVT.D.fmt 





31 26 25 21 20 16 15 11 10 


6 5 






COP1 
01 000 1 


fmt 



00000 


fs 


fd 


CVT.D 
100001 






6 5 5 5 5 


6 






Format: 


CVT.D.fmt M, fs 













Description: 

The contents of the floating-point register specified by fs is interpreted 
in the specified source format, fmt, and arithmetically converted to the 
double binary floating-point format. The result is placed in the float- 
ing-point register specified by fd. 

This instruction is valid only for conversions from single floating- 
point format, 32-bit or 64-bit fixed-point format. 
If the single floating-point or single fixed-point format is specified, the 
operation is exact. The operation is not defined if bit of any register 
specification is set and the FR bit in the Status register equals zero, 
since the register numbers specify an even-odd pair of adjacent copro- 
cessor general registers. When the FR bit in the Status register equals 
one, both even and odd register numbers are valid. 

Operation: 



StoreFPR (fd, D, ConvertFmt(ValueFPR(fs, fmt), fmt, D)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
Underflow exception 
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CVT.L.fmt 



Floating-Point 

Convert to Long 

Fixed-Point Format 



CVT.L.fmt 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



COP1 
01 0001 



fmt 




00000 



fs 



id 



CVT.L 
100101 



6 



6 



Format: 



CVT.L.fmt fd, fs 



Description: 

The contents of the floating-point register specified by fs is interpreted 
in the specified source format fmt, and arithmetically converted to the 
long fixed-point format. The result is placed in the floating-point reg- 
ister specified by fd. 

This instruction is valid only for conversions from single-, double-, ex- 
tended- or quard-precision floating-point formats. If extended- or 
quad-precision format is specified, the operation is not defined if bit 
of the source register specification is set, since the register number 
specifies an aligned coprocessor general register. 
When the source operand is an Infinity, NaN, or the correctly rounded 
integer result is outside of -2 63 to 2^- 1, the Invalid operation excep- 
tion is raised. If the Invalid operation is not enabled then no exception 
is taken and I 63 - 1 is returned. 

This instruction is not implemented on MIPS I or MIPS II processors, 
and will cause an unimplemented operation exception to occur. 
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CVT.L.fmt 



31 



26 25 



C0P1 
010001 



fmt 



Floating-Point 

Convert to Long 

Fixed-Point Format 



CVT.L.fmt 



21 20 



16 15 



11 10 




00000 



fs 



fd 



6 5 



CVT.L 
100101 



Operation: 



T: StoreFPR (fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L)) 



Exceptions: 

Coprocessor unusable exception 

Coprocessor Interrupt (R2000, R3000, or R6000) 

Floating-Point exception (R4000) 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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CVT.S.fmt 



Floating-Point 

Convert to Single 

Floating-Point Format 



CVT.S.fmt 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



COP1 
010001 



fmt 




00000 



fs 



fd 



CVT.S 
100000 



6 



Format: 



CVT.S.fmt fd, fs 



Description: 

The contents of the floating-point register specified by fs are interpret- 
ed in the specified source format, fmt, and arithmetically converted to 
the single binary floating-point format. The result is placed in the 
floating-point register specified by fd. Rounding occurs according to 
the currently specified rounding mode. 

This instruction is valid only for conversions from double floating- 
point format, or from 32-bit or 64-bit fixed-point format. The operation 
is not defined if bit of any register specification is set and the FR bit 
in the Status register equals zero, since the register numbers specify an 
even-odd pair of adjacent coprocessor general registers. When the FR 
bit in the Status register equals one, both even and odd register num- 
bers are valid. 
Operation: 



T: StoreFPR(fd, S, ConvertFmt(VaiueFPR(fs, fmt), fmt, S)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
Underflow exception 



R4000 User's Manual-Preliminary 



B-29 



Appendix B 



CVT.W.fmt 



Floating-Point 

Convert to 

Fixed-Point Format 



CVT.W.fmt 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



COP1 
01 0001 



fmt 




00000 



fs 



fd 



CVT.W 
100100 



6 



Format: 

CVT.W.fmt fd, fs 

Description: 

The contents of the floating-point register specified by fs are interpret- 
ed in the specified source format, fmt, and arithmetically converted to 
the single fixed-point format. The result is placed in the floating-point 
register specified by fd. 

This instruction is valid only for conversion from a single- or double- 
precision floating-point formats. The operation is not defined if bit 
of any register specification is set and the FR bit in the Status register 
equals zero, since the register numbers specify an even-odd pair of ad- 
jacent coprocessor general registers. When the FR bit in the Status reg- 
ister equals one, both even and odd register numbers are valid. 
When the source operand is an Infinity or NaN, or the correctly round- 
ed integer result is outside of -2 31 to 2 31 - 1, an Invalid operation ex- 
ception is raised. If Invalid operation is not enabled, then no exception 
is taken and 2 31 - 1 is returned. 
Operation: 



StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, fmt), fmt, W)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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DIV.fmt 



Floating-Point Divide 



DIV.fmt 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



COP1 
01 0001 


fmt 


ft 


fs 


fd 


DIV 
000011 



Format: 



DIV.fmt fd,fs,ft 



Description: 

The contents of the floating-point registers specified by fs and ft are in- 
terpreted in the specified format and arithmetically divided. The result 
is rounded as if calculated to infinite precision and then rounded to 
the specified format, according to the current rounding mode. The re- 
sult is placed in the floating-point register specified by fd. 
This instruction is valid for only single or double precision floating- 
point formats. 

The operation is not defined if bit of any register specification is set 
and the FR bit in the Status register equals zero, since the register num- 
bers specify an even-odd pair of adjacent coprocessor general regis- 
ters. When the FR bit in the Status register equals one, both even and 
odd register numbers are valid. 

Operation: 



T: StoreFPR (fd, fmt, ValueFPR(fs, fmt) 

ValueFPR(ft, fmt)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
Division-by-zero exception 
Inexact exception 
Overflow exception 
Underflow exception 
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DMFC1 



Doubleword Move From 
Floating-Point Coprocessor 



DMFC1 



31 



26 25 



21 20 



16 15 



11 10 



COP1 
010001 



DMF 
00001 



fs 




000 0000 00 00 



11 



Format: 

DMFCl rt, fs 

Description: 

The contents of register fs from the floating-point coprocessor is stored 
into processor register rt. 

The contents of general register rt are undefined for time T of the in- 
struction immediately following this load instruction. 
The FR bit in the Status register specifies whether all 32 register of the 
R4000 are addressable. When FR is clear, this instruction is not defined 
when the least significant bit of fs is non-zero. When FR is set, fs may 
specify either odd or even registers. 

Operation: 



64 



T: 



T+1: 



i1 SR 26 = 1 then 

data «- CPR[1 ,fe] 
else 

data«-CPR[l,fc 4 ..i II 0] 
endif 

GPR[rt] <- data 



Exceptions: 

Coprocessor unusable exception 
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DMTC1 



Doubleword Move To 
Floating-Point Coprocessor 



DMTC1 



31 26 


25 21 


20 


16 


15 


11 10 


COP1 
1000 1 


DMT 
0010 1 


rt 


fs 



000 0000 00 00 



11 



Format: 



DMTCl rt, fs 



Description: 

The contents of general register rt are loaded into coprocessor regis- 
ter/s of the CP1. 

The contents of floating-point register fs are undefined for time T of 
the instruction immediately following this load instruction. 

The FR bit in the Status register specifies whether all 32 register of the 
R4000 are addressable. When FR equals zero, this instruction is not de- 
fined when the least significant bit of fs is non-zero. When FR equals 
one, fs may specify either odd or even registers. 

Operation: 



64 


T: data <- GPR[rt] 




T+1: if SR 2 6 = 1 ^en 




CPR[1 , fs] «- data 




else 




CPR[l,fS4..-, || 0]*- data 




endif 



Exceptions: 

Coprocessor unusable exception 
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FLOOR. Lfmt 



Floating-Point 
Floor to Long 
Fixed-Point Format 



FLOOR.L.fmt 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



COP1 
01000 1 



fmt 




00000 



fs 



fd 



FLOOR.L 
001011 



6 



Format: 

FLOOR.L.fmt fd, fs 

Description: 

The contents of the floating-point register specified by fs are interpret- 
ed in the specified source format, fmt, and arithmetically converted to 
the single fixed-point format. The result is placed in the floating-point 
register specified by fd. 

Regardless of the setting of the current rounding mode, the conver- 
sion is rounded as if the current rounding mode is round to -» (3). 
This instruction is valid only for conversion from single-, double, ex- 
tended or quad-precision floating-point formats. If extended or quad- 
precision format is specified, the operation is not defined if bit of the 
source register specification is set, since the register number specifies 
an aligned coprocessor general register. 

When the source operand is an Infinity, NaN, or the correctly round- 
ed integer result is outside of -2 63 to 2 63 - 1, the Invalid operation ex- 
ception is raised. If the Invalid operation is not enabled then no excep- 
tion is taken and 2 s3 — 1 is returned. 

This instruction is not implemented on MIPS I or MIPS II processors, 
and will cause an unimplemented operation exception to occur. 
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FLOOR.L.fmt ™g£" FLOOR.L.fmt 



Long 
Fixed-Point Format 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



COP1 
010001 


fmt 



00000 


fs 


fd 


FLOOR.L 
001011 



Operation: 



T: StoreFPR(1d, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L)) 



Exceptions: 

Coprocessor unusable exception 

Coprocessor Interrupt (R2000, R3000, or R6000) 

Floating-Point exception (R4000) 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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FLOOR.W.fmt %£££$> FLOOR.W.fmt 

Fixed-Point Format 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



COP1 
01 0001 



fmt 




00000 



fs 



fd 



FLOOR.W 
001111 



6 



Format: 

FLOOR. W.fmt fd, fs 

Description: 

The contents of the floating-point register specified by fs are interpret- 
ed in the specified source format, fmt, and arithmetically converted to 
the single fixed-point format. The result is placed in the floating-point 
register specified by fd. 

Regardless of the setting of the current rounding mode, the conver- 
sion is rounded as if the current rounding mode is round to -°° (RM = 
3). 

This instruction is valid only for conversion from a single- or double 
precision floating-point formats. The operation is not defined if bit 
of any register specification is set and the FR bit in the Status register 
equals zero, since the register numbers specify an even-odd pair of ad- 
jacent coprocessor general registers. When the FR bit in the Status reg- 
ister equals one, both even and odd register numbers are valid. 
When the source operand is an Infinity or NaN, or the correctly round- 
ed integer result is outside of -2 31 to 2 31 - 1, an Invalid operation ex- 
ception is raised. If Invalid operation is not enabled, then no exception 
is taken and 2 31 - 1 is returned. 
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FLOOR.W.f mt %ZS£& FLOOR.W.f mt 

Fixed-Point Format 
(continued) 



Operation: 



T: StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, {mt), fmt, W)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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LDC1 



Load Doubleword to FPU 
(coprocessor 1) 



LDC1 




Format: 

LDCl ft, offset(base) 

Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form an unsigned effective address. In 32-bit mode, the 
contents of the doubleword at the memory location specified by the ef- 
fective address is loaded into registers ft and ft+1 of the floating-point 
coprocessor. This instruction is not valid, and is undefined, when the 
least significant bit of ft is non-zero. In 64-bit mode, the contents of 
the doubleword at the memory location specified by the effective ad- 
dress are loaded into the 64-bit register ft of the floating point copro- 
cessor. The FR bit of the Status register (SR 2 6> specifies whether all 32 
registers of the R4000 are addressable. When FR=0, this instruction is 
not defined when the least significant bit of ft is non-zero. When FR=1 , 
ft may specify either odd or even registers. 

If any of the three least-significant bits of the effective address are non- 
zero, an address error exception takes place. 
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Load Doubleword to FPU 

(coprocessor 1) 

(continued) 



LDC1 



Operation: 



32 



64 



T: 



vAddr <- ((offset 15 ) 16 || offset 15 „ ) + GPR[base] 
(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
if BigEndianCPU = 1 then 

CPR[1, ft+1] <- LoadMemory (uncached, WORD, 

pAddr+0, vAddr+0, DATA) 
CPR[1, ft+0] <- LoadMemory (uncached, WORD, 
pAddr+4, vAddr+4, DATA) 
else 

CPR[1, ft+0] «- LoadMemory (uncached, WORD, 

pAddr+0, vAddr+0, DATA) 
CPR[1, ft+1] <- LoadMemory (uncached, WORD, 
pAddr+4, vAddr+4, DATA) 
endif 

vAddr <- ((offset 15 ) 48 || offset 15 . ) + GPRfbase] 

(pAddr, uncached) «- AddressTranslation (vAddr, DATA) 

data <- LoadMemory(uncached, DOUBLEWORD, pAddr, vAddr, DATA) 

if SR 2 6 = 1 then 

CPR[1,ft]<-data 
else 

CPR[1,n4„i || 0] 4- data 
endif 



Exceptions: 

Coprocessor unusable 
TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 



R4000 User's Manual-Preliminary 



B-39 



Appendix B 



LWC1 



Load Word to FPU 
(coprocessor 1) 



LWC1 



31 26 


25 21 


20 


16 


15 







LWC1 
1 1 0001 


base 


ft 


offset 


6 


5 


5 






16 





Format: 



LWCl ft, offset(base) 



Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form an unsigned effective address. The contents of the 
word at the memory location specified by the effective address is load- 
ed into register ft of the floating-point coprocessor. 
The FR bit of the Status register specifies whether all 64-bit Floating- 
Point Registers are addressable. If FR equals zero, LWCl loads either 
the high or low half of the 16 even Floating-Point Registers. If FR equals 
one, LWCl loads the low 32-bits of both even and odd Floating-Point 
Registers. 

If either of the two least-significant bits of the effective address is non- 
zero, an address error exception occurs. 
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Load Word to FPU 

(coprocessor 1) 

(continued) 



LWC1 



Operation: 



32 



64 



vAddr <- ((offset 15 ) 16 1| offset 15 ) + GPR[base] 
(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
CPR[1, ft] <- LoadMemory (uncached, WORD, 

pAddr, vAddr, DATA) 

vAddr <- ( (offset 15 ) 48 1| offset 15 . ) + GPRfbase] 
(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr <- pAddr PS izE-i..3 II (P A ddr 2 „o xor (ReverseEndian || 2 )) 
mem <- LoadMemory(uncached, WORD, pAddr, vAddr, DATA) 
byte <- vAddr 2 . xor (BigEndianCPU || 2 ) 
if SR 2 6 = 1 then 

CPR[1, ft] «- undefined 32 || mem 31+8 #byte..8#byie 
else if ft =0 then 

CPR[1, ft4_r II 0] <- CPR[1, ft 4 ..i || 0J64..32 1| mem 31+8#byte .. 8#b y t6 
else 

CPR[1, ft 4 „ 1 || 0] «- mem 31+8#byte .. B > byte || CPR[1, ft 4 ..i || 0] 31 .. 
endif 



Exceptions: 

Coprocessor unusable 
TLB refill exception 
TLB invalid exception 
Bus error exception 
Address error exception 
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MFC1 



Move From FPU 
(coprocessor 1) 



MFC1 



31 



26 25 



21 20 



16 15 



11 10 



COP1 
010001 



MF 
00000 



rt 



fs 




000 0000 0000 



11 



Format: 

MFCl rt,fs 

Description: 

The contents of register fs from the floating-point coprocessor are 
stored into processor register rt. 

The contents of register rt are undefined for time T of the instruction 
immediately following this load instruction. 

The FR bit of the Status register specifies whether all 32 registers of the 
R4000 are addressable. If FR equals zero, MFCl stores either the high 
or low half of the 16 even Floating-Point Registers. If FR equals one, 
MFCl stores the low 32-bits of both even and odd Floating-Point Reg- 
isters. 
Operation: 



32 T: data <- CPR[1,fs]; 

T+1: GPR[rt] <- data 

64 T: if fs =0 then 

data <- CPR[1 , fs 4 ..-, ||0] 31 .. 
else 

dataf-CPRn.-fs^.-, II 0] 63 32 
endif 
T+1 : GPR[rt] <r- (data 31 ) 32 || data 



Exceptions: 

Coprocessor unusable exception 
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MOV.fmt Floating-Point Move MOV.flTlt 



31 26 


25 21 


20 16 


15 


11 


10 


6 


5 


COP1 
010001 


fmt 



00000 


fs 


fd 


MOV 
000110 



Format: 

MOV.fmt fd, fs 
Description: 

The contents of the FPU register specified by fs are interpreted in the 
specified format and are copied into the FPU register specified by fd. 
The move operation is non-arithmetic; no IEEE 754 exceptions occur 
as a result of the instruction. 

This instruction is valid only for single- or double-precision floating- 
point formats. 

The operation is not defined if bit of any register specification is set 
and the FR bit in the Status register equals zero, since the register num- 
bers specify an even-odd pair of adjacent coprocessor general regis- 
ters. When the FR bit in the Status register equals one, both even and 
odd register numbers are valid. 

Operation: 



T: StoreFPR(fd, fmt, ValueFPRffs, fmt)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
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MTC1 



Move To FPU 
(coprocessor 1) 



MTC1 



31 



26 25 



21 20 



16 15 



11 10 



COP1 
010001 



MT 
00 100 



is 



000 0000 0000 



11 



Format: 

MTCl it, fs 
Description: 

The contents of register rt are loaded into the FPU's general register at 
location fs. 

The contents of floating-point register fs is undefined for time T of the 
instruction immediately following this load instruction. 
The FR bit of the Status register specifies whether all 32 registers of the 
R4000 are addressable. If FR equals zero, MTCl loads either the high 
or low half of the 16 even Floating-Point Registers. If FR equals one, 
MTCl loads the low 32-bits of both even and odd Floating-Point Regis- 
ters. 
Operation: 



32 T: data «- GPR[rt] 
T+1: CPR[1,fs]<-data 

64 T: data <- GPR[rt] 31 .. 
T+1: if SR 26 = 1 then 

CPR[1 , Is] <- undefined 32 || data 
else if fs =0 then 

CPR[1, fs 4 ..i || 0] «- CPR[1, ts 4 „-, || 0] 63 ..32 1| data 
else 

CPR[1, feo || 0] f- data || CPR[1, fs 4 .. 1 || 0] 31 „ 
endif 



Exceptions: 

Coprocessor unusable exception 
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M U L.f mt Floating-Point Multiply M \J |_.f Hit 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



COP1 
010001 



fmt 



fs 



fd 



MUL 
0000 10 



Format: 



MUL.fmt fd, fs, ft 



Description: 

The contents of the floating-point registers specified by fs and ft are in- 
terpreted in the specified format and arithmetically multiplied. The re- 
sult is rounded as if calculated to infinite precision and then rounded 
to the specified format, according to the current rounding mode. The 
result is placed in the floating-point register specified by fd. 
This instruction is valid only for single- or double-precision floating- 
point formats. 

The operation is not defined if bit of any register specification is set 
and the FR bit in the Status register equals zero, since the register num- 
bers specify an even-odd pair of adjacent coprocessor general regis- 
ters. When the FR bit in the Status register equals one, both even and 
odd register numbers are valid. 

Operation: 



T: StoreFPR (fd, fmt, ValueFPR(fs, fmt) * ValueFPR(ft, fmt)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
Inexact exception 
Overflow exception 
Underflow exception 
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N E G -f mt Floating-Point Negate N E G -flTlt 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



COP1 
10001 



fmt 




00000 



fs 



fd 



NEG 
0001 1 1 



Format: 



NEG .fmt fd, fs 



Description: 

The contents of the FPU register specified by fs are interpreted in the 
specified format and the arithmetic negation is taken (the polarity of 
the sign-bit is changed). The result is placed in the FPU register speci- 
fied by fd. 

The negate operation is arithmetic; an NaN operand signals invalid 
operation. 

This instruction is valid only for single- or double-precision floating- 
point formats. The operation is not defined if bit of any register spec- 
ification is set and the FR bit in the Status register equals zero, since the 
register numbers specify an even-odd pair of adjacent coprocessor 
general registers. When the FR bit in the Status register equals one, 
both even and odd register numbers are valid. 

Operation: 



StoreFPR(fd, fmt, AbsoluteValue(ValueFPR(fs, fmt))) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
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ROUND.L.f mt ™5£™ ROUND.L.f mt 

Fixed-Point Format 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



COP1 
010001 


fmt 



00000 


fs 


fd 


ROUND.L 
001 000 



6 



Format: 

ROUND.L.fmt fd, fs 
Description: 

The contents of the floating-point register specified by fs are interpret- 
ed in the specified source format, fntt, and arithmetically converted to 
the long fixed-point format. The result is placed in the floating-point 
register specified by fd. 

Regardless of the setting of the current rounding mode, the conver- 
sion is rounded as if the current rounding mode is round to nearest/ 
even (0). 

This instruction is valid only for conversion from single-, double-, ex- 
tended or quad-precision floating-point formats. If extended or quad- 
precision format is specified, the operation is not defined if bit of the 
source register specification is set, since the register number specifies 
an aligned coprocessor general register. 

When the source operand is an Infinity , NaN, or the correctly round- 
ed integer result is outside of -2 63 to 2 63 - 1, the Invalid operation ex- 
ception is raised. If the Invalid operation is not enabled then no excep- 
tion is taken and 2 613 - 1 is returned. 

This instruction is not implemented on MIPS I or MIPS II processors, 
and will cause an unimplemented operation exception to occur. 
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ROUND.L.fmt Sr ROUND.L.fmt 



Round to Long 
Fixed-Point Format 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



COP1 
1000 1 



1mt 




00000 



fs 



fd 



ROUND.L 
00 1000 



6 



Operation: 



StoreFPR(fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L}) 



Exceptions: 

Coprocessor unusable exception 

Coprocessor Interrupt (R2000, R3000, or R6000) 

Floating-Point exception (R4000) 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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ROUND.W.fmt Fu»«"g-wnt ROUND.W.fmt 

Round to Single 



Fixed-Point Format 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



COP1 
010001 



fmt 




00000 



fs 



fd 



ROUND.W 
00 1100 



6 



6 



Format: 

ROUND.W.fmt fd, fs 
Description: 

The contents of the floating-point register specified by fs are interpret- 
ed in the specified source format, fmt, and arithmetically converted to 
the single fixed-point format. The result is placed in the floating-point 
register specified by fd. 

Regardless of the setting of the current rounding mode, the conver- 
sion is rounded as if the current rounding mode is round to nearest/ 
even (RM = 0). 

This instruction is valid only for conversion from a single- or double- 
precision floating-point formats. The operation is not defined if bit 
of any register specification is set and the FR bit in the Status register 
equals zero, since the register numbers specify an even-odd pair of 
adjacent coprocessor general registers. When the FR bit in the Status 
register equals one, both even and odd register numbers are valid. 
When the source operand is an Infinity or NaN, or the correctly round- 
ed integer result is outside of -2 31 to 2 31 - 1, an Invalid operation ex- 
ception is raised. If Invalid operation is not enabled, then no exception 
is taken and 2 31 - 1 is returned. 
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ROUND.W.fmt SSS5S. ROUND.W.fmt 

Fixed-Point Format 
(continued) 

Operation: 



StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, fmt), fmt, W» 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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q r\f\ A Store Doubleword from FPU 
OUv/1 (coprocessor 1 ) 


SDC1 


31 26 25 21 20 16 15 





SDC1 
111101 


base ft 


offset 




6 5 5 16 





Format: 

SDCl ft, offset(base) 
Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form an unsigned effective address. 
In 32-bit mode, the contents of registers ft and ft+1 from the floating- 
point coprocessor are stored at the memory location specified by the 
effective address. This instruction is not valid, and is undefined, when 
the least significant bit of ft is non-zero. 

In 64-bit mode, the 64-bit register ft is stored to the contents of the 
doubleword at the memory location specified by the effective address. 
The FR bit of the Status register (SR 26 ) specifies whether all 32 registers 
of the R4000 are addressable. When FR=0, this instruction is not 
defined if the least significant bit of ft is non-zero. If FR=1, ft may 
specify either odd or even registers. 

If any of the three least-significant bits of the effective address are non- 
zero, an address error exception takes place. 
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np.Aj Store Doubleword from FPU Ql^Oi 

OUv/1 (coprocessor 1) OLyV-r I 



(continued) 
Operation: 



32 T: vAddr «- (offset 15 ) 16 || offset 15 .. ) + GPR[base] 

(pAddr, uncached) <- Addresstranslation (vAddr, DATA) 
if BigEndianCPU - 1 then 

StoreMemory (uncached, WORD, CPR[1, ft+1], 

pAddr+0, vAddr+0, DATA) 
StoreMemory (uncached, WORD, CPR[1, ft+0], 
pAddr+4, vAddr+4, DATA) 



else 



StoreMemory (uncached, WORD, CPR[1, 1t+0], 

pAddr+0, vAddr+0, DATA) 
StoreMemory (uncached, WORD, CPR[1, ft+1], 

pAddr+4, vAddr+4, DATA) 



endif 



64 T: vAddr<-(offset 15 ) 16 ||offset 15 ..o) + GPR[base] 

(pAddr, uncached) «- AddressTranslation (vAddr, DATA) 
if SR26 = 1 

data«-CPR[i,ft] 
else 

data«-CPR[l,1t4./l || 0) 
endif 
StoreMemory(uncached, DOUBLEWORD, data, pAddr, vAddr, DATA) 



Exceptions: 

Coprocessor unusable 
TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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SQRT.fmt 



Floating-Point 
Square Root 



SQRT.fmt 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



COP1 
010001 


fmt 



00000 


fs 


fd 


SQRT 
0001 00 



6 



Format: 

SQRT.fmt fd, fs 
Description: 

The contents of the floating-point register specified by fs are interpret- 
ed in the specified format and the positive arithmetic square root is tak- 
en. The result is rounded as if calculated to infinite precision and then 
rounded to the specified format, according to the current rounding 
mode. If the value of fs corresponds to -0, the result will be -0. The re- 
sult is placed in the floating-point register specified by fd. 
This instruction is valid only for single- or double-precision floating- 
point formats. 

The operation is not defined if bit of any register specification is set 
and the FR bit in the Status register equals zero, since the register 
numbers specify an even-odd pair of adjacent coprocessor general 
registers. When the FR bit in the Status register equals one, both even 
and odd register numbers are valid. 

Operation: 



T: StoreFPR(fd, fmt, SquareRoot(ValueFPR(fs, fmt))) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
Inexact exception 
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SUB.fmt Floating-Point Subtract SUB-flTlt 



31 26 25 21 20 16 15 11 10 


6 5 




COP1 
01 0001 


fmt 


ft 


fs 


fd 


SUB 
00000 1 




6 5 5 5 5 


6 




Format: 


SUB.fmt fd, fs, ft 













Description: 

The contents of the floating-point registers specified by fs and ft are 
interpreted in the specified format and arithmetically subtracted. The 
result is rounded as if calculated to infinite precision and then 
rounded to the specified format, according to the current rounding 
mode. The result is placed in the floating-point register specified by fd. 
This instruction is valid only for single- or double-precision floating- 
point formats. 

The operation is not defined if bit of any register specification is set 
and the FR bit in the Status register equals zero, since the register 
numbers specify an even-odd pair of adjacent coprocessor general 
registers. When the FR bit in the Status register equals one, both even 
and odd register numbers are valid. 

Operation: 



T: StoreFPR (fd, fmt, ValueFPR(fs, fmt) - ValueFPR (ft, fmt)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Unimplemented operation exception 
Invalid operation exception 
Inexact exception 
Overflow exception 
Underflow exception 
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SWC1 



Store Word from FPU 
(coprocessor 1) 



SWC1 





31 26 25 21 20 


16 15 











SWC1 
111001 


base 


ft 


offset 






6 5 5 




16 







Format: 

SWCl ft, offset(base) 
Description: 

The 16-bit offset is sign-extended and added to the contents of general 
register base to form an unsigned effective address. The contents of 
register ft from the floating-point coprocessor are stored at the 
memory location specified by the effective address. 

The FR bit of the Status register specifies whether all 64-bit Floating- 
Point Registers are addressable. If FR equals zero, SWCl stores either 
the high or low half of the 16 even Floating-Point Registers. If FR equals 
one, SWCl stores the low 32-bits of both even and odd Floating-Point 
Registers. 

If either of the two least-significant bits of the effective address are 
non-zero, an address error exception occurs. 
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SWC1 



Store Word from FPU 
(coprocessor 1) 

(continued) 



SWC1 



Operation: 



32 T: vAddr <- ((offset 15 ) 16 || offset 15 .. ) + GPR[base] 

(pAdclr, uncached) <- AddressTranslation (vAddr, DATA) 

data<-CPR[1,ftj 

StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA) 

64 T: vAddr <- ((offset 15 ) 48 || oflset 15 .. ) + GPR[base] 

(pAddr, uncached) <- AddressTranslation (vAddr, DATA) 
pAddr «- pAddr PS , ZE .i„3 II (pAddr 2 .. xor (ReverseEndian || 2 )) 
byte <- vAddr 2 xor (BigEndianCPU || 2 ) 
if SR 26 = 1 then 

data «- CPR[1, tt] 6 3-8-byte..o II 8 * byle 
else if fto=0 then 

data «- CPR[1, ^ || 0] 63 . 8 . byte .. 1| O 8 *^ 6 
else 

data «- O 32 " 8 *^ || CPR[1, ft 4 ..i || 0] 6 3.. 32 -8-byte 

endif8*byte 

StoreMemory (uncached, WORD, data, pAddr, vAddr, DATA) 



Exceptions: 

Coprocessor unusable 
TLB refill exception 
TLB invalid exception 
TLB modification exception 
Bus error exception 
Address error exception 
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TRUNC±JmtCcr, P o°L n o t n g TRUNC.L.fmt 

Fixed-Point Format 



31 26 25 21 


20 16 


15 


11 


10 


6 


5 


COP1 
01 0001 


fmt 



00000 


1s 


fd 


TRUNC.L 
00 1 001 



Format: 



TRUNC.Limt fd,fs 



Description: 

The contents of the floating-point register specified by fs are interpret- 
ed in the specified source format, fmt, and arithmetically converted to 
the single fixed-point format. The result is placed in the floating-point 
register specified by fd. 

Regardless of the setting of the current rounding mode, the conver- 
sion is rounded as if the current rounding mode is round toward zero 
(1). 

This instruction is valid only for conversion from single-, double-, ex- 
tended or quad-precision floating-point formats. If extended or quad- 
precision format is specified, the operation is not defined if bit of the 
source register specification is set, since the register number specifies 
an aligned coprocessor general register. 

When the source operand is an Infinity, NaN, or the correctly rounded 
integer result is outside of -2 63 to 2 - 1, the Invalid operation excep- 
tion is raised. If the Invalid operation is not enabled then no exception 
is taken and 2^ — 1 is returned. 

This instruction is not implemented on MIPS I or MIPS II processors, 
and will cause an unimplemented operation exception to occur. 
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TRUNC±.fmt ^Se 9 t P o Lo„ g TRUNC.L.fmt 



Fixed-Point Format 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



COP1 
010001 



fmt 




00000 



fs 



fd 



TRUNC.L 
00 1 001 



Operation: 



StoreFPR(fd, L, ConvertFmt(ValueFPR(fs, fmt), fmt, L)) 



Exceptions: 

Coprocessor unusable exception 

Coprocessor Interrupt (R2000, R3000, or R6000) 

Floating-Point exception (R4000) 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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TRUNC.W.fmt T **?**?*, TRUNC.W.fmt 

■ ■ wi* w. >. "■■"•'Truncate to Single 



Fixed-Point Format 



31 



26 25 



21 20 



16 15 



11 10 



6 5 



COP1 
010001 



imt 




00000 



fs 



fd 



TRUNC.W 
00110 1 



Format: 



TRUNC.W.fmt fd, fs 



Description: 

The contents of the FPU register specified by fs are interpreted in the 
specified source format fmt and arithmetically converted to the single 
fixed-point format. The result is placed in the FPU register specified 
by fd. 

Regardless of the setting of the current rounding mode, the 
conversion is rounded as if the current rounding mode is round 
toward zero (RM = 1). 

This instruction is valid only for conversion from a single- or double- 
precision floating-point formats. The operation is not defined if bit 
of any register specification is set and the FR bit in the Status register 
equals zero, since the register numbers specify an even-odd pair of 
adjacent coprocessor general registers. When the FR bit in the Status 
register equals one, both even and odd register numbers are valid. 

When the source operand is an Infinity or NaN, or the correctly 
rounded integer result is outside of -2 31 to 2 31 -1, an Invalid operation 
exception is raised. If Invalid operation is not enabled, then no 
exception is taken and -2 31 is returned. 
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TRUNC.W.fmt.r^ri^JRUNC.W.fmt 

Fixed-Point Format 
(continued) 



Operation: 



T: StoreFPR(fd, W, ConvertFmt(ValueFPR(fs, fmt), fmt, W)) 



Exceptions: 

Coprocessor unusable exception 
Floating-Point exception 

Coprocessor Exceptions: 

Invalid operation exception 
Unimplemented operation exception 
Inexact exception 
Overflow exception 
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FPU Instruction Opcode Bit Encoding 




Figure B-5 Bit Encoding for FPU Instructions 
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B 




































5..3 

1 
2 
3 
4 
5 
6 
7 


2..0 



1 


2 


function 

3 4 


5 


6 


7 




ADD 


SUB 


MUL 


DIV 


SQRT 


ABS 


MOV 


NEG 




ROUND.LTl 


TRUNC.LTl 


CEIL.LT| 


FLOOR.LT1 


ROUND.W 


TRUNC.W 


CEIL.W 


FLOOR.W 


8 


8 


8 


8 


r- 5 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


CVT.S 


CVT.D 


8 


8 


CVT.W 


CVT.Lt) 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


C.F 


C.UN 


C.EQ 


C.UEQ 


COLT 


CULT 


COLE 


C.ULE 


C.SF 


C.NGLE 


C.SEQ 


C.NGL 


C.LT 


C.NGE 


CLE 


C.NGT 



















Figure B-6 Bit Encoding for FPU Instructions (cont.) 



Key: 



Operation codes marked with a gamma cause a 
reserved instruction exception. They are reserved for 
future versions of the architecture. 

8 Operation codes marked with a delta cause 

unimplemented operation exceptions in all current 
implementations and are reserved for future versions 
of the architecture. 

t\ Valid for 64-bit mode only. 
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c 



Single Error Correcting/Double Error Detecting Codes 

The ECC codes chosen for the processor's secondary cache data and 
secondary cache tag are single error correcting double error detecting 
codes that also detect three or four bit errors within a nibble. These 
codes were developed from codes proposed by M. Y. Hsiao in his pa- 
per, "A Class of Optimal Minimum Odd-weight-column SECDED 
Codes". The 64-bit data code is a modification of one of the 64-bit 
codes proposed by Hsiao to include the ability to detect three- and 
four-bit errors within a nibble. The 25-bit tag code was created using 
the patterns observed in the 64-bit data code. 
The data code has the following properties: 

1 . It is a single error correcting, double error and three or four bit 
error within a nibble detecting code. 

2. It provides 64 data bits protected by 8 check bits yielding 8 bit 
syndromes. 

3. It is minimal in that each parity tree used to generate the syn- 
drome has only 27 inputs, the minimum possible number. 

4. It provides byte XOrs of the data bits as part of the XOr trees 
used to build the parity generators. This allows picking byte 
parity out of the XOr trees that generate or check the code. 

5. Single bit errors are indicated by syndromes that contain ex- 
actly 3 ones or by syndromes that contain exactly 5 ones in 
which bits 0-3 or bits 4-7 of the syndrome are all one. This 
makes it possible to decode the syndrome to find which data 
bit is in error with 4-input NAND gates, provided a pre-de- 
code AND of bits 0-3 and bits 4-7 of the syndrome is available. 
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For the check bits, a full 8-bit decode of the syndrome is re- 
quired. 

6. Double bit errors are indicated by syndromes that contain an 
even number of ones. 

7. Three bit errors within a nibble are indicated by syndromes 
that contain 5 ones in which bits 0-3 of the syndrome and bits 
4-7 of the syndrome are not all one. 

8. Four bit errors within a nibble are indicated by syndromes 
that contain 4 ones. Because this is an even number of ones, 
four bit errors within a nibble look like double bit errors. 

The tag code has the following properties: 

1 . It is a single error correcting, double error and three or four bit 
error within a nibble detecting code. 

2. It provides 25 data bits protected by 7 check bits yielding 7-bit 
syndromes. 

3. It provides byte XOrs of the data bits as part of the XOr trees 
used to build the parity generators. This allows picking byte 
parity out of the XOr trees that generate or check the code. 

4. Single bit errors are indicated by syndromes that contain ex- 
actly 3 ones. This makes it possible to decode the syndrome to 
find which data bit is in error with 3 input NAND gates. For 
the check bits a full 7 bit decode of the syndrome is required. 

5. Double bit errors are indicated by syndromes that contain an 
even number of ones. 

6. Three bit errors within a nibble are indicated by syndromes 
that contain 5 ones or 7 ones. 

7. Four bit errors within a nibble are indicated by syndromes 
that contain 4 ones or 6 ones. Because these are even numbers 
of ones, four bit errors within a nibble look like double bit er- 
rors. 

The parity check matrices for the data ECC code and the tag ECC code 
specifying the distribution of data and check bits across nibbles are 
shown in Figure C-7 and Figure C-8. The Check Bits in Figure C-7 cor- 
respond to SysADC(7:0), SCDChk(15:8), or SCDChk(7:0). The Data 
Bits in Figure C-7 correspond to SysAD(63:0), SCData(127:64), or SC- 
Data(63:0). The Check Bits in Figure C-8 correspond to SCTChk(6:0) 
and the Data Bits in Figure C-8 correspond to SCTag(24:0). 
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The Parity Check Matrices, in Figure C-7 and Figure C-8, are used to 
generate the ECC code for a fixed-width data word. The Parity Check 
Matrices can also be used to find the data bit that is in error. In Figure 
C-7, the data word is 64 bits and in Figure C-8, the data word is 25 bits. 



ECC Check Code Generation 



An 8-bit ECC check code is generated in the following manner. The 
state, either 1 or 0, of each bit of the ECC check code is determined by- 
generating even parity for a selected group of data bits. The state of an 
even parity bit is a "1" if there is an odd number of "l's" in the data 
word and a "0" if the data word is all "0's" or if there is an even num- 
ber of "l's" in the data word. 

For each bit of the ECC check code (1 ECC code bit per row), the se- 
lected group of data bits consists of all of the data and check bits in 
which there is a "1" in the data bit or check bit column for that row. If 
check bits are used to generate the ECC check code bit, assume that it 
has a "0" state. The "." represents a "0" and means that this particular 
data or check bit is not used to generate this ECC code bit. For exam- 
ple, if Data(63:0) = Ox 0000 0000 0000 0000 then the ECC check code is 
0000 0000; if Data(63:0) = Ox 0000 0000 0000 0001 then the ECC check 
code is 0001 0011. 



Determining Single Data Bit Errors 



The following procedure is used to determine which single data bit is 
in error. Assume a system transmitted a 64-bit doubleword and 8 bits 
of ECC. To verify proper transmission of the 64-bit doubleword and 8- 
bit ECC check code, the receiving system generates an 8-bit ECC check 
code from the received 64-bit doubleword. The receiving system then 
exclusive ORs the received check bits with the newly generated ECC 
check bits. The results of this exclusive OR is called the syndrome. If 
the syndrome is 0000 0000, it indicates that the received word and 
newly generated ECC check bits are the same as the transmitted word 
and check bits. If the syndrome corresponds to any of the syndromes 
in the Figure C-7, the data bit or check bit that corresponds to this syn- 
drome is the bit in error. A quick way to determine if there is a match 
is to look at the number on ones in the syndrome. If the syndrome con- 
tains either three or five ones, the syndrome is in Figure C-7. If the syn- 
drome contains a single one, the erroneous bit is an ECC check bit. If 
the syndrome is contained in the Figure C-7, the bit that is in error can 
be corrected by inverting the state of that bit. 
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The following examples show in what instances Parity Check Matrices 

are used: 

Single Data bit error. 

Single Check bit error. 

Multiple Data bit errors (2 consecutive bits in a nibble) 

Multiple Data bit errors (3 consecutive bits in a nibble) 

Multiple Data bit errors (4 consecutive bits in a nibble) 

Single Data Bit ECC Error 

A single data bit ECC error can be detected and corrected as follows. 
Assume, the data doubleword Data(63:0) = Ox 0000 0000 0000 0000 
with ECC check code 0000 0000 is transmitted and data doubleword 
Data(63:0) = Ox 0000 0000 0000 0001 with ECC check code 0000 0000 is 
received. The receiving system will regenerate the ECC for the re- 
ceived data. The ECC check code for Data(63:0) = Ox 0000 0000 0000 
0001 is 0001 0011. The syndrome is generated by the exclusive OR of 
the received check bits, 0000 0000, and the regenerated check bits, 0001 
0011. The resulting syndrome is 0001 0011. Since the syndrome has 
three Is, it is contained in the parity check matrix. Searching the ma- 
trix (Figure C-7) shows that the syndrome, 0001 0011, corresponds to 
data bit 0. This indicates that the state of the received data bit is in- 
correct. To correct the error, the system will invert the state of the re- 
ceived data bit 0. 

Single Check Bit ECC Error 

A single check bit ECC error can be detected and corrected as follows. 
Assume the data doubleword Data(63:0) = Ox 0000 0000 0000 0000 
with ECC check code 0000 0000 is transmitted and data doubleword 
Data(63:0) = Ox 0000 0000 0000 0000 with ECC check code 0000 0001 is 
received. The receiving system regenerates the ECC for the received 
data. The ECC check code for Data(63:0) = Ox 0000 0000 0000 0000 is 
0000 0000. The syndrome is generated by the exclusive OR of the re 
ceived check bits, 0000 0001, and the regenerated check bits, 0000 0000. 
The resulting syndrome is 0000 0001. Since the syndrome has one 1, it 
is contained in the parity check matrix. Searching this matrix (Figure 
C-7) shows that the syndrome, 0000 0001, corresponds to check bit 0. 
This indicates that the state of the received check bit is incorrect. To 
correct the error, the system invert sthe state of the received check bit 
0. 
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Multiple Data Bit ECC Errors 



A multiple (2 bits within a nibble) data bit ECC error can be detected 
as follows. Assume the data doubleword Data(63:0) = Ox 0000 0000 
0000 0000 with ECC check code 0000 0000 is transmitted and data dou- 
bleword Data(63:0) = Ox 0000 0000 0000 0011 with ECC check code 
0000 0000 is received. The receiving system regenerates the ECC for 
the received data. The ECC check code for Data(63:0) = Ox 0000 0000 
0000 001 1 is 001 1 0000. The syndrome is generated by the exclusive OR 
of the received check bits, 0000 0000, and the regenerated check bits, 
0011 0000. The resulting syndrome is 0011 0000. Since the syndrome 
has two Is or an even number of Is, it indicates that the a double bit 
error has been detected. The double bit error cannot be corrected. 

A multiple (3 bits within a nibble) data bit ECC error can be detected 
as follows. Assume the data doubleword Data(63:0) = Ox 0000 0000 
0000 0000 with ECC check code 0000 0000 is transmitted and data dou- 
bleword Data(63:0) = Ox 0000 0000 0000 0111 with ECC check code 
0000 0000 is received. The receiving system regenerates the ECC for 
the received data. The ECC check code for Data(63:0) = Ox 0000 0000 
0000 0111 is 0111 0011. The syndrome is generated by the exclusive OR 
of the received check bits, 0000 0000, and the regenerated check bits, 
0111 0011. The resulting syndrome is 0111 0011. Since the resulting 
syndrome has five Is and no four of the Is are contained in check bits 
(7:4) or check bits (3:0), the user knows that 3 errors occurred within a 
nibble. The triple bit error within a nibble cannot be corrected. 

A multiple (4 bits within a nibble) data bit ECC error can be detected 
as follows. Assume the data doubleword Data(63:0) = Ox 0000 0000 
0000 0000 with ECC check code 0000 0000 is transmitted and data dou- 
bleword Data(63:0) = Ox 0000 0000 0000 1111 with ECC check code 
0000 0000 is received. The receiving system regenerates the ECC for 
the received data. The ECC check code for Data(63:0) = Ox 0000 0000 
0000 1111 is 1111 0000, The syndrome is generated by the exclusive OR 
of the received check bits, 0000 0000, and the regenerated check bits, 
1111 0000. The resulting syndrome is 1111 0000. Since the resulting 
syndrome has four Is or an even number of Is, this error looks like a 
double bit error. The 4 bit errors within a nibble cannot be corrected. 



25-Bit Parity Check Matrix 



This same procedure works for the 25-bit parity check matrix shown 
in Figure C-8. The only difference is in the number of check bits used 
and decode of errors . 
Figure C-7 Parity Check Matrix for the Data ECC Code 
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Check Bit 



Data Bit 



MSB 

ECC 

Code 

Bits 

LSB 



27 
27 
27 
27 
27 
27 
27 
27 



6666 
3210 



43 



55 
98 



Number of 
ones in 
syndrome* 



1111 
1111 



1. . . 
. 1. . 
. . 1. 



1 



3333 



11.. 
1. . . 
1. .. 
1. 1. 
. 1. 1 
11.. 
.1.. 
. 1.. 



5511 



5555 
7654 



52 



55 
32 



11.. 
1... 
11.. 
.1.. 
..11 
. .1. 
..11 
...1 



3333 



1.. . 
1... 
1.1. 
11.. 
. 1. . 
.1. 1 
11. 
. 1. 



5511 



5544 
1098 



1... 
.1.. 
. .1. 
...1 



4444 
7654 



1111 
1111 



3333 



1111 
1111 
1... 
. 1. . 
..1. 



4444 
3210 



1111 
1111 
1. .. 



1 



1 



3333 



3333 



3333 
9876 



1111 



1111 
1.. 
.1. 
.. 1 



3333 

5432 



1111 



1111 
1.. 
. 1. 
..1 



1 



3333 



3333 



3322 

1098 



1. .. 
. 1. . 
.. 1. 
.. . 1 
1111 



1111 



3333 



2222 
7654 



1. . . 
.1.. 
..1. 
. .. 1 
1111 



1111 



2222 1111 



3210 



1. . 
. 1. 



1, 



.. .1 
1111 
1111 



3333 



3333 



9876 



1. .. 
. 1.. 
..1. 
. . .1 
1111 
1111 



3333 



1111 
5432J 



1111 
1111 



1.. 

. 1. 



1 
.1 



3333 



70 



11 
10 



I. 1. 

II. . 

I. . . 
1... 

II. . 
. 1. . 
. 1. . 
.1.1 



9876 



.1.. 
11.. 
1... 
11.. 
.. .1 
.. 11 
.. 1. 
. . 11 



5511 



61 



54 



1. . . 

I. 1. 

II. . 
1... 
. 1.. 
. 1. . 
.1. 1 
11, 



3210 



1111 
1111 



3333 



5511 3333 



* This row indicates the number of ones in the generated syndrome, 
for each data bit in error. 



Check Bit 





12 


34 


56 










Data Bit 


222 
432 


22 
10 


11 
98 


11 
76 


1111 
5432 


11 
1098 


7654 


3210 


MSB 

ECC 

Code 

Bits 

LSB 


11 
13 
10 
10 
13 
11 
14 


. 1.. 
. . 1. 

1... 


1. .. 
.1.. 
1. .. 

.. .1 


1... 
. 1.. 
... 1 
.. 1. 

1. .. 


... 1 

1. 

1.. . 

1 
1... 


1111 
1111 

1. .. 

. 1.. 
. . 1. 


1. .. 
1111 
1111 

.1.. 

.. 1. 


1. . . 

.1.. 
1111 
1111 
.. 1. 


1. .. 
. 1. . 
.. 1. 

ii'li 

1111 
... 1 


1111 


11.. 


11. . 


11.. 


... 1 


. . .1 


... 1 


Number ot 
ones in 


3331 


3311 


3311 


3311 


3333 


3333 


3333 


3333 


syndrome* 



















Figure C-S Parity Check Matrix for the Tag ECC Code 
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* This row indicates the number of ones in the generated syndrome, 
for each data bit in error. 
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D 



Sub-block ordering is an order for transmitting the data elements that 
form the block of data when the data element transmitted first is not 
the data element at the beginning of the block. Sub-block ordering 
causes the data elements of the block to be transmitted in an order that 
fills out sub-blocks of increasing size. For the R4000, the smallest data 
elements of a block transfer is a double word; therefore, the double 
word at the target address is transferred first, followed by the double 
word that fills out the quad word that contains the starting double 
word. Next, the quad word that fills out an octal word containing the 
starting quad word is transferred in the same order as the first quad 
word, followed by the octal word that fills out a hex word containing 
the starting octal word in the same order as the first octal word, and 
so on through sub-blocks of increasing size until the entire block has 
been transferred. 

Perhaps an easier way to consider sub-block ordering is to look at a 
method for generating the addresses, within the block, of the double 
words to be transferred for sub-block ordering. A simple method for 
generating such addresses is to bit-wise XOR the starting double word 
address with the output of a binary counter that is counting the double 
words in the block starting at double word zero. 
Table D-l through Table D-3 illustrate the sequence of double words 
transferred using sub-block ordering for a thirty-two word block 
based on three different starting block addresses. For these 
illustrations, the double words in the block will be identified by their 
block addresses. The block address for each double word in a block is 
derived by numbering the double words in the block sequentially 
starting with zero. 
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The tables also include a binary count of the double words in the block 
to illustrate the XOr relationship between this count, the starting 
address, and the block addresses of the double words transferred. 



D_2 R4000 User's Manual-Preliminary 



Sub-block Ordering 



Table D-l Sequence of Double Words Transferred Using Sub-block Ordering- 0010 



Cycle 


Starting Block 




Address 


1 


0010 


2 


0010 


3 


0010 


4 


0010 


5 


0010 


6 


0010 


7 


0010 


8 


0010 


9 


0010 


10 


0010 


11 


0010 


12 


0010 


13 


0010 


14 


0010 


15 


0010 


16 


0010 



Binary Count 


Double Word 




Transferred 


0000 


0010 


0001 


0011 


0010 


0000 


0011 


0001 


0100 


0110 


0101 


0111 


0110 


0100 


0111 


0101 


1000 


1010 


1001 


1011 


1010 


1000 


1011 


1001 


1100 


1110 


1101 


1111 


1110 


1100 


1111 


1101 



Table D-2 Sequence of Double Words Transferred Using Sub-block Ordering- 1011 



Cycle 


Starting Block 




Address 


1 


1011 


2 


1011 


3 


1011 


4 


1011 


5 


1011 


6 


1011 


7 


1011 


8 


1011 


9 


1011 


10 


1011 


11 


1011 


12 


1011 


13 


1011 


14 


1011 


15 


1011 


16 


1011 



< Binary Count 


Double Word 




Transferred 


0000 


1011 


0001 


1010 


0010 


1001 


0011 


1000 


0100 


1111 


0101 


1110 


0110 


1101 


0111 


1100 


1000 


0011 


1001 


0010 


1010 


0001 


1011 


0000 


1100 


0111 


1101 


0110 


1110 


0101 


1111 


0100 
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Table D-3 Sequence of Double Words Transferred Using Sub-block Ordering- 0101 



Cycle 


Starting Block 




Address 


1 


0101 


2 


0101 


3 


0101 


4 


0101 


5 


0101 


6 


0101 


7 


0101 


8 


0101 


9 


0101 


10 


0101 


11 


0101 


12 


0101 


13 


0101 


14 


0101 


15 


0101 


16 


0101 



Binary Count 


Double Word 




Transferred 


0000 


0101 


0001 


0100 


0010 


0111 


0011 


0110 


0100 


0001 


0101 


0000 


0110 


0011 


0111 


0010 


1000 


1101 


1001 


1100 


1010 


1111 


1011 


1110 


1100 


1001 


1101 


1000 


1110 


1011 


1111 


1010 
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The speed of the R4000 output drivers is controlled by a negative 
feedback loop that insures drive-off times are only as fast as necessary 
to meet the system requirement of single cycle transfers. This 
guarantees the minimum ground bounce due to the L"Ai/At of the 
switching buffers, consistent with the system timing requirements. 
Four bits are used to control each of the pull-up and pull-down delays. 
These bits are initially set to the values in the mode bits InitN<3..0> 
for pull-up and InitP<3..0> for pull-down. 

Under normal conditions, the Ai/At control mechanism is expected to 
be constantly enabled so that it can compensate the output buffer 
delay for any changes in the temperature or power supply voltage. 
The EnblDPLL mode bit should be set for this mode of operation. 
For situations where the jitter associated with the operation of the Ml 
At control mechanism cannot be tolerated and where the variation in 
temperature and supply voltage after ColdReset is expected to be 
small, the Ai/At control mechanism can be instructed to lock only 
during ColdReset and thereafter retain its control values. The 
EnblDPLLR mode bit should be set and the EnblDPLL mode bit 
should be cleared for this mode of operation. 
In addition, if both the EnblDPLL and EnblDPLLR mode bits are 
cleared, the speed of the output buffers can be set with the InitP<3..0> 
and InitN<3..0> mode bits. 

The drive off delays can be set through the mode bits. Currently, 
delays of 0.5T, 0.75T, and T are supported, corresponding to the 
Drv0_50, Drv0_75, and Drvl_ 00 mode bits, where T is the 
MasterClock period. For example, in Drv0_75 mode, the entire signal 
transmission path including the clock-to-Q, output buffer drive time, 
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board, flight time, input buffer delay, and setup time will be traversed 

in 0.75 * the MasterClock period, plus or minus the jitter due to the Ai/ 

At control mechanism. 

All output drivers on the R4000, with the exception of the clock 

drivers, are controlled by the Ai/At control mechanism. The delay due 

to the output buffer drive time component of the SCAddr<17„0>, 

SCOEB, SCWRB, SCDCSB, and SCTCSB pins is approximately 66% 

of the delay of drivers of the other pins. 

The R4000 determines the worst case propagation delay from an 

R4000 output driver to a receiving device by measuring the 

transmission line delay of the trace that connects the R4000 IO_Out 

and lOJn pins. This representative trace must have one and a half 

times the length and approximately the same capacitive loading as the 

worst case trace on any R4000 output. 

The designer determines the trace characteristics by: 

• measuring the longest path from an R4000 output driver to 
a receiving device: L 

• calculating the maximum capacitive loading on any signal 
pin: C 

• connecting an incident-wave trace of length L with a 
capacitive loading of C between the IO_In and IO_Out 
pins of the R4000 

• connecting a reflected wave trace of length L/2 to the 
IO_In pin of the R4000. 
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Output Buffer Di/Dt Control Mechanism 

An R4000 with appropriate traces connected to thelOJn and IO_Out 
pins is illustrated in Figure E-9. 
Figure E-9 lOJnjlOJDut Board Trace 

CPU Board 



R4000 



IO.Out IOJn 






Length = L/2 



Load 



The longest trace from an 
R4000 output driver to a 
receiving device 



'Reflected wave" trace 




Incident Wave" Trace 



L = a + b + c + d 

C = Total Capacitance Loading of the worst case trace 
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PLL Passive Components 



F 



The Phase Locked Loop circuit requires several passive components 
for proper operation. These passive components are connected to 
PLLCapO, PLLCapl, VccP, and VssP, as illustrated in Figure F-10. In 
addition, the capacitors for PLLCapO (Cp) and PLLCapl (Cp) can be 
connected to either VssP (as shown), VccP, or one to VssP and one to 
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VccP. Note that C2 and both Cp capacitors are incorporated into both 
the 179PGA and 447PGA package designs as surface-mounted chip 
capacitors. 

Figure F-10 PLL Passive Components 
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Vcc 



Cp 
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%1 
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VccP 



■0 



C2 
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A top view of the 179-pin package with caps looks like this: 



X 


X 


o 
x 


die 




x C2 x 




%1 %2 





x: Vss-Vcc Bypass Caps 
C2: VssP- VccP Bypass Caps 
%1, %2: PLL Caps 
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A top view of the 447-pin package with chip capacitors looks like this: 





X X 


o 


X 
X 

X 


die 


X 
X 


%1 C2 %2 


X 



x: Vss-Vcc Bypass Caps 
C2: VssP-VccP Bypass Caps 
%1,%2: PLL Caps 



It is essential to isolate the analog power and ground for the PLL 
circuit (VccP/VssP) from the regular power and ground (Vcc/ Vss). 
Initial evaluations have yielded good results with the following 
values: 

Cl=lnF C2=82nF 



R=5 ohms 
C3=10uF 



Cp=470pF 



Since the optimum values for the filter components depend upon the 
application and the system noise environment, these values should be 
considered as starting points for further experimentation within the 
application specific context. In addition, the chokes (inductors: L) can 
be considered for use as an alternative to the resistors (R) for power 
supply filtering. 
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R4000 Coprocessor Hazards 



G 



R4000 Coprocessor hazards are listed in Table G-4. The following 
notes apply: 

a Starus.EXL and Status.ERL are permanently cleared in 
stage 8, but the effect of clearing them is visible to instruc- 
tion fetch starting in stage 4. 

(3 Only one instruction needs to separate Index Load Tag 
and MFCO Tag, even though the above would imply three 
instructions. 

• The instruction following a MTCO must not be a MFCO. 

• The five instructions following a MTCO to Status that 
changes KSU and sets EXL or ERL may be executed in 
the new mode, and not kernel mode. This can be 
avoided by setting EXL first, leaving KSU set to kernel, 
and then later changing KSU. 

• There must be two non-load and non-CACHE 
instructions between a store and a CACHE instruction 
directed to the same primary cache line as the store. 
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Table C-4 R4000 Coprocessor Hazards 



Operation 


Source 


Destination 


Name 


Stage 


Name 


Stage 


MTCO r 


gprrt 


3 


cprrd 


7 


MFCO 


cprrd 


4 


gprrt 


7 


TLBR 


Index, TLB 


5-7 


PageMask, EntryHi, 
EntryLoO, EntryLol 


8 


TLBWI 
TLBWR 


Index or Random 


5-7 


TLB 


8 


PageMask, EntryHi, 


EntryLoO, EntryLol 


TLBP 


PageMask, EntryHi 


3-6 


Index 


7 


ERET 


EPC or ErrorEPC, 
Status, TLB 


4 


Status.EXL, Status.ERL 


4-8<X 


LLbit 


7 


CACHE Index Load Tag 






TagLo, TagHi, ECC 


8(3 


CACHE Index Store Tag 


TagLo, TagHi, ECC 


7 






CACHE Hit ops 






Status.CH 


8 


Instruction fetch 


EntryHLASID 







Status.KSU, Status.EXL, 


Status.ERL, Status.RE, 


Config.KOC, Config.IB 


ConfigSB 


3 


TLB 


2 


Instruction fetch exception 






EPC, Status 


8 


Cause, BadVAddr, 
Context 


3 


Coprocessor usable test 


Status.CU, Status.KSU 
Status.EXL, Status.ERL 


2 




Interrupt 


Cause.IP, Status.IM 
Status.IE, Status.EXL 
Status.ERL 


3 
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R4000 Coprocessor Hazards 



Operation 


Source 


Destination 


Name 


Stage 


Name 


Stage 


Load /Store 


EntryHLASID 
Status.KSU, Status.EXL, 
Status.ERL, Status.RE, 
Config.KOC,Config.DB 
TLB 


4 




Config.SB 


7 


WatchHi, WatchLo 


4-5 


Load /Store exception 




EPC, Status, Cause 


8 


BadVAddr, Context 


TLB shutdown 




Status.TS 


7 
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9/11 /9: 
K)OOMC Err ata, Processor Revision 1.1 and j.O 

,srsasffisesr*" 
3S«SSS55aeslfefir"- ,, — 

Workaround- There is no workaround forte problem when using the update protocoi. Invahdate 

protocol should be used in place of updates. 

2. Replace this errata with the text in errata 5. 

3 Haemal requests may cause an incorrect state change in the secondary cache when the R4000 

is taa <Sa or instruction cache miss on a non-coherent page. 

Workaround: Do not use the cacheable, noncoherent TLB page attribute in a multiprocessing sys- 

tern. 

^^^^^^^^^^^^^^^ 
Workaround: Update protocol should no, b f usedfor *e store conditional insurrodons. Previous 
errata prevent the use of the update protocol for TLB pages. 
5. Store-load interlock breaks strong ordering. This situation can occur under the following condi- 

aliases to the same cache line location. 

2. The store misses in the primary. 

3. The stall sequence for the processor contains mo stages. which 

or the processor must issue an invalidate or update request. 
The problem arrses if an intervention request ^fSKSS" ,B 
l^SJ/SS %%£&Z&^£%&& -« the second stage of the 
stall. 

This may create two problems. First, -™on ^-^^^^hScS ° f 
the stall but before the second stage, will be sa ashed beroie the swie comp issues 

stage of the stall. The intervention response will contain the new data, uuei , y 
an update or invalidate which violates strong ordering. 



tions: 



data. 
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Workaround: Whenever a store to a sharable line is followed immediately by a load to a sharable 
line and the store and load have the same primary cache address, i.e. the 12 LSBs of the store ad- 
dress and load address match, insert a noop (or a non-memory instruction) between the store and 
load. A less fine-grained fix would be to just detect the store to sharable line followed by a load to 
sharable line without considering the address match and insert the noop or non-memory instruc- 
tion. 

New code sequence for general solution: 

STORE instruction<SHARABLE line> 

NOOP 

LOAD instruction<SHARABLE line> 

6. With the system interface configured with ECC checking, the response to a Snoop request con- 
tains primary cache parity on the SysADC bus rather than ECC. 

Workaround: Since a snoop request does not return data, the information present on the SysAD 
and SysADC busses should be ignored even if incorrect data is present. 

7. If a store hit in the primary cache causes a state change stall and an external request invalidates 
the target of the store, the store will miss in and generate a read request for the target line. In this 
case, the check pins, SysADC, will not be driven during the read request. 

Workaround: During a read request, ignore the SysADC pins. 

8. Do not split the command and data cycles of an update cycle. Under certain conditions, the 
R4000 will prematurely take ownership of the system interface bus before the data cycle is issued 
to the R4000 if there are any cycles between the update command and data cycles. 

Workaround: Do not split the command and data cycles of an update cycle. 

9. Under the conditions listed below, the EB bit in the CacheErr register is incorrectly set. 

1) A store targets a shared line in the primary cache 

2) The tag in this line has a parity error 

3) Under this condition, the processor will stall due to a data cache miss and the CacheErr register 
is set. 

4) As the processor comes out of the data cache miss and before it vectors to the CacheErr excep- 
tion vector, there is an instruction cache miss and a pending external request which targets the same 
line with the parity error. 

Under these conditions, the EB bit will get set although there was no parity error. The EB bit im- 
plies that both a data and instruction parity error have occurred. In this case, there was only a data 
parity error. 

Workaround: The EB bit is meaningful only if the ER bit indicates instruction error. 

10. R4O00PC, R4000SC: If a secondary cache line that is being replaced matches the address cur- 
rently stored for a load linked, the cache line retained bit may not always be set in the read com- 
mand issued for a cache refill.. 

Workaround: Software must guarantee that the load link address is not replaced by the instruction 
block that contains the load link instruction. 
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Note: Change bars in the left column indicate corrections or changes from the last revision of the 
errata. 

1. R4000PC, R4000SC: Master/checker mode is not available. 

Workaround: The mode bits to enable Master/Checker operation should not be set to one. These 
mode bits are the MCMode and DataMaster modebits. Behavior of the R4000 is undefined when 
Master/Checker mode is enabled. 

2. R4000PC, R4000SC: Status output pins do not function. 

3. R4000PC, R4000SC: Reduced power mode is not available in the current revision of the R4000. 
Setting the RP bit in the Status register has no effect on the operation of the R4000. 

4. R4000PC, R4000SC (Note: Processor revision 3.0 does not contain this errata): An instruction 
sequence which contains a load which causes a data cache miss and a jump, where the jump in- 
struction is that last instruction in the page and the delay slot of the jump is not currently mapped, 
causes the exception vector to be overwritten by the jump address. The R4000 will use the jump 
address as the exception vector. 



Example: 



lw < — data cache miss 

noop < — one or two Noops 



j r < — last instruction in the page (jump or branch instruction) 

< — page boundary 

noop 

Workaround: Jump and branch instructions should never be in the last location of a page. 

5. R4000SC: When the primary instruction cache is configured with an 8-word line size, the virtual 
coherency exception (VCE) does not function correctly. 

Workaround: The primary instruction cache should only be configured with a 4- word line size. 

6. R4000PC, R4000SC: The following conditions cause the R4000 to operate incorrectly: 

1. An exception is taken from user code 

2. On the eret instruction of the exception handler, a CacheErr exception is taken. 

The R4000 takes the CacheErr exception correctly but returns to user code instead of the ERET in 
the exception handler. This will then cause an Address Error exception. 

Workaround: Use the following code sequence as the last three instructions of the exception han- 
dler: 

eret 

noop 

eret 

The CacheErr handler must add 4 to the ErrorEPC to return to the noop. 

7. R4000PC (Note: Processor revision 3.0 does not contain this errata): When an external request 
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is placed between the read and write of a data cache replacement of a dirty cache line and an un- 
cached store is stalled in the WB pipeline stage, the R4000 will send out the command code for an 
uncached store during the block write of the writeback. 

Workaround: Interrupts should be signaled through the IntO through Int5 pins on the R4000PC 
package. An external null request should not be signaled between the write and read of the replace- 
ment of a dirty cache line. In the Revision 2.2 R4000, there are four cycles between the write and 
read of a dirty cache line replacement. 

8. R4Q00PC, R4000SC: In supervisor mode, with the SX bit (64-bit mode enabled for supervisor 
mode) in the Status register set to one, the R4000 will generate an Address Error Exception in the 
"csseg" region: OxFFFF FFFF C00O 0000 to OxFFFF FFFF DFFF FFFF. 

9. R4000PC: If a writeback is delayed by an external request and the processor executes a cache 
operation which generates a writeback with the retained bit set on the system command bus, the 
retained bit will also be set for the first writeback. 

Workaround: Since the retained bit is intended for multiprocessing systems, R4000PC systems 
should ignore this bit on the system command bus. 

10. R4000PC, R4000SC: In kernel mode with the KX bit (64-bit mode enabled for kernel mode) 
in the Status register set to one^ the R4000 fails to generate an Address Error Exception if a load or 
store is attempted in the region: OxCOOO OFFF F000 0000 to OxFFFF FFFF 8000. Attempting ac- 
cess to this region should cause an Address Error exception. 

11. R4000PC, R4000SC: In the case: 

Iw rA, (rn) 

noop (or any non-conflicting instruction) 

lw rn, (rA) (where the address in rA causes a TLB refill) 

> end of page 

page not mapped 

where m and RA are general purpose registers rO through r3l 

This code sequence causes the second load instruction to slip due to a load use interlock. When 
the R4000 crosses the page boundary after the lw, it vectors to 0x8000 0000 and later causes an 
instruction cache miss. After the insumction cache miss is complete the LW causes another TLB 
refill. This should vector to 0x8000 0000 but instead goes to 0x800001 80. 

Workaround: In the general exception handler, the CAUSE register must be checked to see if a 
TLB miss is indicated. 

12. R4000SC: The following conditions cause the virtual address in the BadVA register to be cor- 
rupted by a TLB Probe instruction (TLBP) 

The problem occurs if: 

1. An exception is generated 

2. A TLBProbe is required to handle the exception 

3. While handling the exception, a VCED or VCEI occurs, the BadVA may be incorrect. 

Workaround: The exception handler should jump to uncached space to handle the TLBProbe. 

13. R4000SC: The cacheop. Create_DEX_SC (secondary cache operation), when executed on a 
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non-coherent line, will change the state of the secondary line to Clean Exclusive instead of Dirty 
Exclusive. 

Workaround: This situation should not create any problem in normal operation since a subsequent 
store will change the line to Dirty Exclusive. 

14. This errata is an update to errata 4. The incorrect behavior does not occur with two Noops be- 
tween the load and jump. There is a qualifying condition on the load and jump instructions and 
several similar cases are listed which cause the same error. 

R4000PC, R4000SC: An instruction sequence which contains a load which causes a data cache 
miss and a jump, where the jump instruction is that last instruction in the page and the delay slot 
of the jump is not currently mapped, causes the exception vector to be overwritten by the jump ad- 
dress. The R4000 will use the jump address as the exception vector. In the first case, the target of 
the load instruction and source register for the jump instruction must be the same register. In all 
other cases, the condition is independent of the registers used. 

Example: 

lw rA,(m) < — data cache miss 

noop < — one Noop 

j r r A < — jump or branch instruction as the last instruction in the page 

< — page boundary 

PAGE NOT MAPPED 

lw < — data cache miss 

div < — signed, unsigned and doubleword integer divide 

beq < — branch instruction as the last instruction in the page 

< — page boundary 

PAGE NOT MAPPED 

sw < — data cache miss 

div < — signed, unsigned and doubleword integer divide 

beq < — branch instruction as the last insu-uction in the page 

. < — page boundary 

PAGE NOT MAPPED 

cacheop < — data cache miss 

div < — signed, unsigned and doubleword integer divide 

beq < — branch instruction as the last instruction in the page 

< — page boundarv 

PAGE NOT MAPPED 

Workaround: Jump and branch instructions should never be in the last location of a page. 

15. R4000PC, R4000SC: This errata was deleted. The problem does not occur on the R4000. 

16. R4000PC, R4000SC: Please refer to errata 28 for an update to this errata description. 

The following code sequence causes the R4000 to incorrectly execute the Double Shift Right 
Arithmetic 32 (dsra32) instruction. If the dsra32 instruction is executed during an integer multi- 
ply, the dsra32 will only shift by the amount in specified in the instruction rather than the amount 
plus 32 bits. 

instruction!: multrs.rt integer multiply 
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instruction 2-12: dsra32 rd,rt,rs doubleword shift right arithmetic + 32 

Workaround: A dsra32 instruction placed after an integer multiply should not be one of the 1 1 in- 
structions after the multiply instruction. 

17. R4000SC: This problem occurs when the system interface of the R4000SC is configured with 
ECC checking. During the writeback of a dirty cache line, the first double word from the secondary 
cache which follows a double word from the primary cache is driven with an incorrect ECC code 
on the SysADC bus. The SysADC bus retains the same code for the secondary cache double word 
as was driven for the primary cache double word. 

Workaround: Use parity checking on the system interface bus (SysAD) rather than ECC. Alter- 
natively, mask all checking errors by setting the DE bit of the Diagnostic Status Field in the Status 
register to one. 

18. R4000SC: Under some conditions, stores which address the same primary cache line (lower 
12 bits are the same), will prevent the R4000 from responding to an external request until all the 
secondary cache accesses are complete. The conditions under which this occur are the following: 

1. Back to back stores which address the same cache line where the tags match. 

2. Back to back stores which address a different address but map to the same cache line (lower 12 
bits of the address are identical). In this case, the stores will result in the replacement of the line. 
If the accesses to the secondary cache map to the same location, the problem does not occur be- 
cause the secondary cache interface is idle during the writeback. In this case, the external request 
will be accepted during the writeback. 

3. Successive stores, not necessarily immediately following one another, which map to the same 
cache line under the conditions listed in 1 and 2 above but where the length of the loop is short 
enough that the secondary cache bus is never idle. The number of instructions in the loop where 
the problem can be observed is dependent upon the secondary cache timing parameters pro- 
grammed by the boot time mode bits. 

19. R4000PC, R4000SC: When there is a store followed by a load to the same address and the 
Watch register contains the address for the load, the Watch exception for the load is not taken. 

Workaround: A "noop" placed between the store and the load will enble the Watch exception to 
be taken con*ectly. 

22. R4000PC, R4000SC: When returning from 32-bit kernel mode to 64-bit user mode, the 
R4000 interprets the address of the first instruction as a 32-bit mode address. This may cause an 
incorrect mapping of the address depending upon the virtual address. 

Workaround: When operating the R4000 in 64-bit user mode, only use 64-bit kernel mode. (KX= 1 
in the CP Status register) 

23. R4000PC, R4000SC: The 64-bit instruction, daddi, fails to take an overflow exception. 
Workaround: There is no workaround for this problem. 

24. R4000PC, R4000SC: The R4000 does not take a Reserved Instruction Exception on the "rfe" 
instruction. The R4000 executes a "noop" and kills the following instruction. 

Workaround: There is no workaround for this problem. 

25. R4000PC, R4000SC: The "dmtcO" and "dmfcO" instructions do not cause a Reserved Instruc- 
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tion Exception in Supervisor mode. These instructions will complete successfully. 
Workaround: There is no workaround for this problem. 

126. R4000SC: Sequential ordering must be used with the R4000SC. The option to mm off sequen- 
tial ordering does not function. 

27. R4000PC, R4000SC: A TLB refill exception occurs on an instruction fetch with a value in the 
CPO register BadVAddr that does not match CPO register EPC. 

The specific case found involved the sequence: 

sw 

nop 

jal 

cvt.s.w 

mul.d 

The store takes a data cache miss. In the restart sequence at the end of the data cache miss, the pipe- 
line is backed up and the floating point scheduler causes a pipeline slip to occur. In this particular 
case the slip that is generated as the data cache stall completes, causes the data address instead of 
the instruction address to be sent to the TLB. This results in an undefined address being given to 
the TLB for translation. This undefined address may cause a TLB refill exception to be taken if the : 
undefined address is not present in the TLB. 

Looking at the general case that could cause this problem to occur, the sequence of instructions 
required is: 

1) an instruction causes a data cache stall (load, store or cacheop) 

2) any instruction that will run without causing a slip or stall to occur 

3) jump or taken branch that does not have its destination mapped in the tlb 

4) pipeline activity that causes a slip in the restart sequnce of the data cache stall 

Workaround: 

The operating system (OS) will normally terminate a process that tries to access an address outside 
the expected range for the process. When the OS detects such an address in BadVAddr, it should 
check EPC to make sure it is within a valid range for the process. If EPC is not within the valid 
range, the OS should execute an "eret'* instruction. The refill instruction will be re-taken and Bad- 
VAddr will contain the correct value.. 

If the OS is unable to determine the valid address range for the process, the value in EPC should 
be used to look for a load or store instruction. If EPC does not point to a load or store, the OS should 
execute an "eret". The "eret" will then cause another TLB refill exception, which will have a valid 
BadVAddr. If EPC points to a load or store, the OS must then interpret the instruction to generate 
the address for the data. If this address matches the address in BadVAddr, the process tried to ac- 
cess data outside the process address space. Otherwise the OS should execute an "eret" causing a 
TLB refill exception where the value in BadVAddr will be valid. 

28. R4000PC, R4000SC: The text from errata 16 should be replaced by the following description. 

All extended shifts (shift by n+32) and variable shifts (32 and 64-bit versions) may produce incor- 
rect results under the following conditions: 
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1. An integer multiply is currently executing 

2. These types of shift instructions are executed immediately following an integer 
divide instruction. 



Workaround: 



1. Make sure no integer multiply is running wihen these instruction are executed. If this cannot be 
predicted at compile time, then insert a "mfhi" to RO instruction immediately after the integer mul- 
tiply instruction. This will cause the integer multiply to complete before the shift is executed. 

2. Separate integer divide and these two classes of shift instructions by another instruction or a 
noop. 

29. R4000SC: Errata 26 is incorrect. The R4000 always uses sequential ordering regardless of the 
state of the mode bit which specifies subblock or sequential ordering. 

30. R4000PC, R4000SC: When an nmi or softreset exception is taken, both exl and erl bits are set 
to one in the Status register. The R4000 should perserve the previous state of exl. 

31. R4000SC: The SCAPar pins on the secondary cache bus are driven incorrectly. The pins are 
driven with even parity for the secondary cache address, SCWrB, SCDCSB and SCTCSB pins. 

Workaround: These bits are not stored in the tag and are only used in systems which externally 
compare the address and parity bits. 

32. R4000PC, R4000SC: Under the following conditions, the CPO register, BadVAddr, can be 
corrupted. 

1) There is a data cache miss which causes a jal (jump and link) to be stalled in the DS pipeline 
stage 

2) A floating point dependency causes a slip during the restart after the data for the data cache 
miss is returned. 

Workaround: 

33. R4000PC, R4000SC: With the following code sequence and conditions, the R4000 with use 
the general exception vector, 0x80000180, instead of the the TLB Exception vector, 
0x800000000. 

Iw - takes a watch exception 
lw - takes a watch exception 
j - TLB exception 
> Page boundary 

Workaround: The general exception handler needs to handle refill exceptions directly from user 
code rather than only within the refill exception handler. 

34. R4000PC, R4000SC: The R4000 incorrectly allows xkseg to access up to OxcOOO 0100 0000 
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0000. This region should only extend to OxcOOO OOff 80OO 0000. 
Workaround: 

35. R4000PC, R4000SC: Split secondary cache mode can cause invalid cache error exceptions. 

Workaround: Use unified secondary cache mode. 

36. R4000PC, R4OO0SC: The first instruction of the ECC handler cannot write to a general pur- 
pose register. 

Workaround: The first instruction of the ECC exception handler should be a Noop. 

37. R4000PC, R4000SC: The PIDx field of the CPO register CacheErr, gets the wrong value on 
Data and Instruction parity errors. ECC errors are will put the correct values in this register, how- 
ever. 

Workaround: Under this failure condition, all possible values of PIDx in the primary cache must 
be checked for parity errors. In the R4000 with 8K primary caches, there are only two values for 
the primary cache index that must be checked. ■ 

38. R4000PC, R4000SC: A parity error on the primary cache dirty data bit, W, will not be 
detected. 

Workaround: There is no workaround for this condition. 

39. Under the following conditions, the TLB attributes for a page being refill into the microTLB 
may be associated with a page mapped in the microTLB. 

1) The address of a jump instruction is to the last instruction in a page. 

2) The next instruction after the jump target is not mapped in the microTLB. 

3) The jump is stalled in the DS pipeline stage. 

When the page targeted by the jump is refilled into the microTLB, the coherency bits associated 
with that page may be incorrect. 

Workaround: This problem occurs if the TLB attributes are different for the two pages. Under 
these conditions, if the TLB attributes are the same, this problem will not occur. 



